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SALARY GROWTH AS A CRITERION OF CAREER PROGRESS 


THOMAS L. HILTON 


Carnegie In 


As 


a 


and WILLIAM 


possible improvement on absolute salary 


R. DILL 


»f 


Technology 


as a criterion, the authors com 


puted the annual percentage growth of the salaries of 143 engineering graduates 


employed in industry. Although 
1950 to 1955, and 1957 
were homogeneous 

Ist-vear and growth 


salary salary 


academic grades, 


Salary growth has some 


The shortcomings of salary as a measure of 
a man’s progress are well known (Bechtoldt, 
1947; Bellows, 1941; Hilton, 1961; Patter- 
son, 1946; Thorndike, 1957; Toops, 1944). 
How his salary is set by his superiors may be 
unrelated to the value of his contribution to 
the organization. It may, for instance, depend 
primarily on his years of service, as in the 
case of most civil servants. His age or educa- 
tional level may be the major determinant. 
Or, as Stark (1959) has pointed out in dis- 
cussing the weaknesses of organization rank 
as a Criterion, prejudices, the chance avail- 
ability of openings for promotion, sponsor- 
ship, and internal politics may strongly in- 
fluence a man’s salary. 

In addition, situational 
clude his receiving 


factors may pre- 
favorable evaluation. 
The level of all salaries may be low in his 
particular branch of industry. His company’s 
ability to pay for professional skills may be 
depressed and the supply of skilled people 
high. This paper concerns an effort to correct 
for several of these factors and describes some 
of the problems encountered. 

Hypotheses. To adjust for in 
absolute level of salary from company 
to another and one industry to another an 
obvious step is to compute the difference be- 
tween starting salary and current salary. Then 
one may divide by the number of years of 
employment to obtain the 


a 


differences 


one 


average annual 


lst-year 
salaries varied with years of 
The rates for different professional groups were 
were 
but absolute salary unexpectedly had a stronger 
useful properties, 
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markedly 
service, the 


salaries increased from 


growth rates 
different 
10 
relationship 
ble 


unrelated. Growth was related 


but it is not uniformly applica 
increment. A criticism of this method, how- 
ever, is that as a model of the salary setting 
process it is not consistent with actual prac- 
tices. The observations of the authors are that 
in determining the annual increase in an 
employee’s salary the factor which receives 
the most attention is the amount of the in- 
crement relative to the man’s current salary 
In other words, the adequacy of a salary in- 
crease tends to be evaluated in terms of the 
percentage growth that it represents. An in- 
crease of $800, for example, is regarded as a 
substantially more liberal increase for a man 
earning $5,000 than man earning 
$20,000. The percentage growth salary 
may psychologically more relevant and 
significant to an employee as a measure of his 


for a 
in 
be 


success than the absolute salary level or than 
the summed increments in salary over a period 
of years. Also it is probably the relative in 
crement which enters into the 
parisons studied recently by Patchen (1961) 
and earlier by Festinger (1954). 


social com- 


These observations suggested the following 
hypotheses: 


1. For at least the first 6 years of employ 
ment, annual salary growth rate is independ 
ent of number of years employed. Casual 
observation that salaries of in 
dustrial professional employees do not con- 
tinue to grow at a constant rate throughout 


indicates of 
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their years of employment and therefore the 
hypothesis was restricted to the first 6 years, 
the time span covered by the data available 
in this study. 

2. In the same way that starting salaries 
and thereby the first year salaries of gradu- 
ates of different professional programs (i.e., 
civil engineering, industrial management, etc.) 
are different, the annual salary growth rates 
of these groups will be different. 

3. For single individuals salary growth is 
positively related to first year salary. The 
thinking here was that to the extent that 
personal evaluations determine first year sal- 
aries they would continue to determine salary 
growth. The appraisal of a college graduate’s 
ability should influence the salary he is of- 
fered for his first year of employment, and 
ceteris paribus, one would expect him to per- 
form in a manner consistent with the initial 
appraisal or, at least, one would expect later 
appraisals to be consistent with the first one. 
Thus, a positive relation should exist between 
first year salary and salary growth. Since first 
year salary enters into the computation of 
salary growth, caution must be exercised in 
interpreting any correlation obtained between 


the two measures. This problem will be dis- 
cussed further shortly. 
4. Undergraduate 
rough measure of each man’s ability to per- 
form well in competitive situations—is posi- 
tively related to salary growth. When the sub- 
jects have widely differing years of industrial 


grade average—as a 


experience, the correlation between grade 
average and growth should be stronger than 
that between grade average and absolute 
salary level. This is because the correlation 
between grades and absolute salary tends to 
be obscured by the correlation between ab- 
solute salary and years of experience men- 
tioned earlier. 
METHOD 

Subjects. Salary data was obtained for a sample of 
143 male college from a large 
sample of graduates who were part of another study 
conducted by the authors. All received BS degrees 
in engineering from the same eastern college from 
1950 to 1955 and were employed by industrial firms 
Men who received advanced degrees were not in- 
cluded in the sample. Seven were employed in 
administrative positions; 93 in construction, main- 
field work; 29 in research and 


graduates selected 


tenance, design, or 
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development; and 14 in production and operations. 
Seventy-two percent were veterans and 74% were 
married at the time they completed the question- 
naires. 

Thus this group is not a representative sample of 
a typical college class. It might rather be described 
as a group of young well-educated engineers work- 
ing in American industry. 

Data. The data were obtained from one page of a 
long questionnaire which was sent to the subjects 
in May 1958 as part of a larger study. The exact 
wording of the relevant questions was as follows: 

1. What was your total basic salary for the 
calendar year 1957? (If you moved from one posi- 
tion to another or if your rate was changed, 
please give the actual total received during the 
year. If employed for less than the whole year, 
please give the number of menths.) 

2. What was the total of your bonus and/or 
incentive payments (excluding base salary) during 
1957? 

3. What were the comparable figures for your 
first year of full-time employment after 
pleting undergraduate work or after military 
service 
Basic Salary $ 
Bonus and/or incentive payments $———______ 
Calendar year in which salary was received —- 

4. Did you have (in 1957) additional taxable 
income from other sources in excess of $500? 


com- 





The subjects were instructed to enclose the page 
in a small envelope marked “Confidential,” to seal 
it, and to return it in the same envelope as the 
questionnaire. Note that the authors asked for basic 
salary and bonuses for the “first year of full-time 
employment after completing undergraduate work 
or after military service.” This figure is likely to be 
different from starting salary, a datum which has 
been used in other studies in this area. Note also 
that the information was treated as confidential and 
that unearned income was not included in the 
reports 

How to handle bonuses was a problem. Are they 
purely windfalls or are they related to the quality 
of an individual’s performance? In this study it 
was decided to focus on the sum of salary and 
bonus, primarily because earlier studies by the au- 
thors indicated that for all intents and purposes the 
bonuses are usually equivalent to salary and should 
be treated as such. Wherever the 
used, therefore, it should be construed to 
bonuses. 


salary is 
include 


word 


Salary growth rate was computed by means of 
the conventional formula for compound interest rate 


, -((Z]°- 1) x 100 


Where = annual rate of increase (percentage /year 

= 1957 salary + bonus (dollars) 

= first year salary + bonus (dollars 

= 1957—year in which first salary was re 
ceived (years ; 
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FIRST YEAR SALARY The distributions for salary growth, first year 

N=143 salary and 1957 salary are given in Figure 1. Be- 
cause of the skewness of the distribution, medians 
have been used as the measure of central tendency 
except where noted. 


FREQUENCY 








s. 4 t 1 1 n a RESULTS 
30 35 40 45 50 55 60 (87) 

THOUSANDS OF DOLLARS Table 1 gives the first year salaries, the 
1957 salaries, and the annual growth rate for 
the graduates arranged in accordance with the 
1957 eT calendar year in which the “first year salary” 


b. was received. It should be noted that this 
year in which the first year salary was re- 
- ceived is not necessarily the year the sub- 
JLALn |] q jects completed their undergraduate work for 

" 1 1 1 i 


60 70 B80 90 100 lO 2o BO many of them served in the armed services 


THOUSANDS OF DOLLARS before taking an industrial position. From 
Table 1 it is clear that salaries received dur- 


ing the first year of employment steadily in- 
SALARY GROWTH creased during the 1950-55 period and that 
N=143 the 1957 salaries of the different groups in- 
creased with the time which elapsed since 
the first year in industry. Salary growth, on 
the other hand, remains fairly constant. When 
a chi square test is performed on the number 
of individual growth rates within each yearly 
group which are above the grand median and 
the number below, the growth rates prove to 
be homogeneous (p beds 
‘Ic. 1. Distribution of salaries and salary growth The second hypothesis (about different pro- 
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TABLE 1 


SALARIES AND GROWTH RATE BY YEAR IN WHICH First YEAR SALARY WAS RECEIVED 


First year salary 1957 salary Annual growth rate 
and bonuses and bonuses (percentage 
First year 
in Upper A ; Upper Lower pper Lower 
industry Median 25% 25% Median 25% 25% Median 25% 25% 


1950 3500 4000 3200 8740 9540 8000 3 5 12.0 
(N = 35) 
1951 4415 3500 8440 9700 7400 3.5 5 ee 
1950 39060 82 9300 7965 
5000 4470 7500 


5336 4500 . 6812 


5400 4900 3: 6440 
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TABLE 2 


MEDIAN SALARIES AND GROWTH 


FOR SUBJECTS ENTERING 


Undergraduate major 


Chemical engineering 
Civil engineering 
Electrical engineering 
Industrial management 
Mechanical engineering 


Balance 


fessional groups) was confirmed, as Table 2 
shows. Because first year salaries have in- 
creased year, the authors focused at- 
tention on a limited time segment, namely, 
1950-51. Otherwise both first year salary and 
1957 salary largely reflect differences in the 
average longevity of the members of the 
groups. The chi square test indicates that the 
growth rates of the different professional 
groups are not homogeneous (p < .01) 

Hypothesis 3 (about first year salary) was 
not confirmed, as is most clearly shown by 
the correlations in Table 3. The sample for 
these correlations is the group of subjects 
who entered industry in 1950. 

\s mentioned earlier there is a part-whole 
problem in interpreting the correlation be- 
tween first year salary and salary growth. 
One way to avoid the problem is to examine 


each 


the correlation between first year salary and 
1957 salary for a group of men who entered 
industry at approximately the same time. If 
there is a nonspurious correlation between 
first year salary and growth there will also be 


RATE BY 
INDUSTRY 


UNDERGRADUATE MAJOR 


IN 1950 AND 1951 


First year 
salary and 
bonuses 


1957 salary Annual growth 
and bonuses rate 


3720 
4000 
3544 
3640 
3630 
3660 


7461 13.0 
8612 10.3 
8758 16.0 
8046 12.0 
9171 14.6 
8400 14.0 


a correlation between first year salary and 
1957 salary. If there is no correlation between 
these we can be confident that there is no cor- 
relation between first year salary and salary 
growth. The obtained correlation of zero in- 
dicates that even when number of years of 
employment is held constant the 1957 salary 
of the men cannot be predicted from a knowl- 
edge of their first year salary. The high nega- 
tive correlation between growth and the first 
year salary will be discussed shortly. 

The fourth hypothesis (about undergradu- 
ate grades) is given some support by the cor- 
relations in Tables 3 and 4 but is not con- 
firmed entirely by them. Contrary to predic- 
tion, the correlation between grades and 1957 
salary is larger for the total sample than it is 
for the 1950 sample alone. 

Additional results. In the way of explora- 
tion, a number of other categorizations of the 
total sample were made to check some pos- 
sible variance in growth rates 
None of these splits of the sample revealed 
any appreciable differences. included 


sources of 


They 


TABLE 3 


PrRopuct 
MEASURES 
(N 


3091 
1957 salarv SSOU 
Salary growtl 13.5 


9g 


MoMENT CORRELATIONS AND RELATED STATISTIC 


AND GRADES 


BETWEEN SALAR 
FOR 1950 GRADUATES 


~ 
35 
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PRODUCT-MOMENT CORRELATIONS AND RELATED STATISTICS 

MEASURES 


BETWEED 
AND GRADES FOR TOTAI 


(N = 143 


SAMPLI 


Me isure V 


4372.7 
8464.6 
14.23 


Grade average 2.8 


First year salary 
1957 salary 


Salary growth 


at the 


it the 


05 level of confider 


01 level of confider 


married versus unmarried, whether the man 
was still with his first employer or not (ap- 
parently moving from one employer to an- 
other does not on the average accelerate salary 
growth), activity in various organizations, 
number of hours worked per week, satisfac- 
tions experienced, and problems encountered. 


DISCUSSION 


The fact that the median salary growth 
rates are homogeneous for the subsamples en- 
tering employment in different vears makes 
the statistic useful as a criterion when one is 
dealing with subjects who entered industry at 
different times. When the subjects all entered 
industry in the same year a choice has to be 
made as to which is the better criterion. If 
one is interested in absolute level of salary 
regardless of starting salary, then the termi- 
nal salary is probably the best index. On the 
other hand one may wish to assign a high 
criterion score to subjects who for one reason 
or another started at low levels, but relative 
to their base have gained at a high rate, even 
though they did not reach the same absolute 
level as the first. 

Table 5 shows some of the interpretational 


TABLE 5 


HYPOTHETICAL SALARIES AND GROWTH RaATI 
First Salary 
year one vear 


Subject salary later 


$3000 
$4000 
$5000 


$3600 
$4640 


$5500 


First 
salary 


problems in this area. Individual A started 
low but relative to this low level received a 
high increment ($600) giving him the highest 
growth figure. Apparently the presence in the 
sample of a group of subjects of this type con- 
tributed to the high negative correlation be- 
tween first year salary and salary growth. In- 
dividual B had the highest increment ($640) 
but because his first year earnings were higher 
than A’s his growth rate is lower. Individual 
C has the highest salary but the increment 
was low and similarly the growth rate. It is 
interesting to note that if the salaries of A 
and C continue to grow at the present rate, A 
will overtake C in approximately 5 years. 
Why there is no correlation between first 
year salary and 1957 salary is not obvious 
A_ possible that first year 
salaries are influenced primarily by supply 
and demand in the professional marketplace 
whereas later salaries are influenced more by 
the on-the-job performance of the man. But 
we know that a man’s salary is not based en- 
tirely on the basis of his performance; for 
example, the different professional specialties 
(chemical engineering, civil engineering, etc.) 


explanation is 


have differing median growth rates as seen in 
Table 2. It is well known that in any year 
there will be sizable differences in the average 
starting salaries for different professional spe- 
cialties. Evidently there are also differences in 


the rates of salary growth among the groups 
These differences are probably attributable to 
differences in demand among the specialties 
If two men are equally competent and at the 
same salary level, the one who has the profes- 
sional training which is in higher demand in 
the general market is likely to receive a higher 
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increase in his salary. Since there are annual 
fluctuations in the demand for different spe- 
cialties, market pressures probably diminish 
any positive correlation between first year 
salary and salary growth. 

The absence of a strong correlation between 
grades and salary growth is no doubt related 
to the lack of correlation between starting 
salary and 1957 salary. Also it appears to 
have resulted in part from the peculiar nu- 
merical properties of salary growth. Inspec- 
tion of scatter plots of the correlation be- 
tween growth and grades indicated that there 
was a substantial number of subjects with 
high grades whose salary history appeared to 
be similar to Individual C in the hypothetical 
example given: their 1957 salary was high 
but their first year salary was also high re- 
sulting in their growth rates being relatively 
low. One possibility is that high ability men 
with high grades and high starting salaries 
are prevented from receiving salary increases 
consistent with their ability by salary ceil- 
ings imposed by the classification systems 
prevalent in most large American corpora- 
tions. Since absolute salary levels have also 
risen rapidly, highly paid young college 
graduates bump rapidly against informal 
norms about what are proper salary limits 
for men of their age and experience. 


SUMMARY AND CONCLUSIONS 


It would be good if there were a criterion 
of career progress which was not subject to 
the shortcomings which salary is, particularly 
its dependence on longevity and its variation 
between industries and between professional 
specialties. In the past the authors have at- 
tempted to use peer ratings, supervisory rat- 
ings, measures of administrative responsibility 
(e.g., number of men supervised), and indices 
of organizational attainment (e.g., level in 
the organizational hierarchy) to name a few 
alternatives. But each of these has deficiencies 
which in the experience of the authors are 
sufficiently serious to justify the use of salary 
as an alternative measure despite its imper- 
fectness. 


The authors designed the salary growth 
measure as a way of avoiding some of the 
shortcomings of absolute salary. The finding 
that the growth figure was independent of 
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number of years of employment for at least 
the first 6 years indicates that the authors 
were to some extent successful. There were dif- 
ferences, however, among the median growth 
rates when the subjects were grouped accord- 
ing to their undergraduate major. Thus salary 
growth is an advantageous measure when com- 
paring subjects with differing years of indus- 
trial experience but when comparing subjects 
in different professional specialties it has the 
same shortcomings as absolute salary. 

Obviously the salaries of most men do not 
continue to grow at constant rates through 
their years of industrial employment. It would 
not be meaningful, therefore, to compare the 
growth rates of men with 25 or 30 years of 
tenure with those of new employees. Also the 
fact that growth rates are highly sensitive to 
differences in first year salaries may make 
the results deceptive for certain comparisons. 
These observations suggest that although 
salary growth rate has some useful proper- 
ties as a measure it must be used with dis- 
cretion. Whether it is used in a particular 
study depends, as with other measures of 
career progress, on the characteristics of the 
sample in question and on the precise inter- 
ests of the investigator. 
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RELIABILITY OF ABSENCE MEASURES 


EDGAR F. HUSE! 


Raytheon Company, Boston, Massachusetts 


anD ERWIN K. TAYLOR 


Personnel Research and Development Corporation, Cleveland, Ohio 


4 different absence measures were defined and examined 


attitudinal absences, 


absence frequency, absence severity, and medical absences. Attitudinal absences 
and absence frequency were sufficiently reliable to be used as criterion meas- 
ures; absence severity and medical absences were considered to be too unre- 


liable for use as criterion measures. 


Absenteeism has been frequently studied as 
either a criterion or a predictor measure. How- 
ever, relatively little attempt has been made 
to systematically examine the reliability of 
absenteeism as such. 

The present study was undertaken to in- 
vestigate the reliability of absenteeism as a 
criterion measure. Specifically, the study was 
designed to determine which of several differ- 
ent types of absence measures had sufficient 
reliability to be used as criterion variables. 


METHOD 


Subjects. The subjects (Ss) were 393 truck drivers 
of a large oil company. At the time of the study, 


1 Formerly with the Standard Oil Company, Cleve- 
land, Ohio. 


the Ss were engaged in delivering fluid petroleum 
products, primarily gasoline, to retail and industrial 
outlets, usually service stations. All drove 
and semitrailers. They reported to terminal superin 
tendents in each of 12 marketing divisions located in 
a single midwestern state. The mean age of the Ss 
was 38, with the age range from 23 to 64 years. The 
range of job experience as truck driver with the firm 
was from 1 to 26 years, with a mean experience 
driver of 8 years 

Procedure. A recording form was used to record 
all nonoccupational illness or injury absences for the 
full years 1957 and 1958. The recording form was 
designed to provide information regarding the num 
ber of 1-day absences, the number of 2-day 
sences, and so on, up to absences of 6 months and 


tractors 


as 


ab 


over 

Definition of absenteeism. Prior to analysis of the 
data, several different types of absence measures were 
defined: (a) absence frequency—total number of 
times absent; (b) absence severity—total number of 


TABLE 1 
INTERCORRELATION OF ABSENCE VARIABLES 


Frequency Frequency 
1958 1957 


Severity 
1958 
Frequency 61 63 

1958 
Frequency 
1957 
Severity 
1958 
Severity 
1957 
Attitudinal 
1958 
Attitudinal 
1957 
Medical 
1958 
Medical 


1957 


Severity 
1957 


28 


61 


Medical 
1958 


Medical 


1957 


Attitudinal Attitudinal 
1958 1957 
47 63 


18 


RS 16 
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days absent; (c) attitudinal absences—frequency of 
1-day absences; and (d) medical absences—frequency 
of absences of 3 days or longer. 

These definitions were built from the assumption 
that absence, as such, can be composed of both atti- 
tudinal absences and medical absences. In other 
words, in the former the worker avoids coming to 
work; in the latter, the worker is sufficiently ill that 
he is unable to come to work. Any individual ab- 
sence, of course, can be a combination of different 
factors. No attempt was made in this study to prove 
or disprove these assumptions. Rather, the four dif- 
ferently defined absence measures were examined to 
determine their relative reliability. 

Analysis of data. The intercorrelation matrix for 
the four different absence measures was computed 
for the years 1957 and 
ment correlation coefficients 


Table 1. 


1958, using product-mo- 


The matrix is given in 


RESULTS AND DISCUSSION 


As shown in Table 1, total absence fre- 
quency has the highest reliability for the 2 
years, .61. Attitudinal absences have a reli- 
ability of .52, while severity and medical ab- 
sences have reliabilities of .23 and 
spectively. 


19, re- 
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The relationship between the variables is 
even more illuminating. Absence frequency is 
not only the most reliable measure; it ap- 
pears to hold more variance in common with 
the other variables. On the other hand, medi- 
cal absences are relatively unique with little 
common variance with any other measures, 
with the exception of severity. As expected, 
attitudinal show a high degree of 
relationship with total absence frequency, a 
much lower relationship with severity, and an 
even lower relationship with medical absences. 


absences 


The reliability of medical absences and ab- 
sence severity makes them suspect for use as 
criterion variables. The reliability of both 


attitudinal absences and absence frequency is 


sufficiently high that either can be used as a 
criterion. The question of which one should 
be used in any particular study would, of 
course, be dependent upon the aims of the 
investigator. 


(Early publication received October 16, 1961) 
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2 hypotheses derived from dissonance 


inequitably large than when identical 


theory were tested: (a) 
paid by the hour his productivity will be 


son is 
his pay as 


when a per 


greater when he perceives 


pay is perceived as equitable, and (b) 
when a person is paid on a piecework basis his productivity will be 


less when 


he perceives his pay is inequitably large than when he epee en identical pay 


as being equitable. The 


first hypothesis was 
tory experiment in which 11 male college 


sustained (p ) in a labora 


Ss earned $3.50 per hour and were 


induced to feel overpaid and 11 control Ss earned $3.50 per hour and were in 


duced to feel fairly paid 
factorial design study in which 
cents per piece, 


When a person performs work in exchange 
for pay it may be assumed that he has cog- 
nitions about what he contributes, his inputs, 
and what he receives for performing the work, 
his outcomes. In addition, the person may be 
assumed to have cognitions about the inputs 
and whom he uses as 
Let- 
designate 
and _ his referent, 


outcomes of others 
referents in making social comparisons. 
ting the terms Person Other 
the focal individual social 
respectively, we state that cognitive dis- 
exists for Person whenever his cog- 
nitions of his job inputs and/or outcomes 
stand psychologically in obverse relation to 
his cognitions of the inputs and/ or outcomes 
of Other (Adams, 1961; 1957) 
Thus, for example, if Person were less well 
qualified for a job than Other, but earned the 
same pay, Person would experience cognitive 
dissonance. (In this illustration dissonance 
would also exist for Other, if Person were his 
referent, but we shall 
Person in this paper.) 

If Person experiences dissonance, he will 
attempt to reduce it and establish consonance 

1The experiments reported herein are part of a 
program of field experiments on 
wages and productivity by the Behavioral Research 
Service, General Electric Company, undertaken with 
the cooperation of the Research Center for Indus 
trial Behavior, New York University 


and 
} 


may 
sonance 


Festinger, 


social focus only on 


laboratory and 


36 Ss were paid either $3.50 per 
and felt either equitably pair or inequitably overpaid 
studies an identical task, in which Ss interviewed the general public, wa 


The second hypothesis was scaiined (p< .01) ina 


hour or 
In both 
s used 


between his job inputs and outcomes, in re- 
lation to the inputs and outcomes of Other. 
Many ways of achieving this are available to 
Person, but we shall be concerned here with 
changes in productivity inputs. More 
cifically, the two experiments reported here 
will give attention to dissonance-reducing pro- 


spe- 


ductivity changes when Person’s outcomes are 
perceived as too great in comparison with his 
own inputs and the inputs and outcomes of 
Other. 

If Person is paid by the hour and a fairly 
large number of units may be produced in 1 
hour, we predict that his productivity will be 
greater when his outcomes are perceived as 
too great than when identical outcomes are 
perceived as equitable. The rationale is that, 
if other means of reducing dissonance are un- 
available, increasing productivity will increase 
Person’s inputs and bring them in line 
his outcomes and Other’s inpt 
comes. If, 


with 
its and out- 
however, Person is paid on a piece 
work basis, we predict that his productivity 
will be smaller when his 
large than when the 
ceived as fair. In 
that 
each unit produced, 


outcomes are too 
Same outcomes are per- 
this case the rationale is 
associated with 


because dissonance is 


dissonance will increase 
hence, 
not so much to reduce dissonance 


as work proceeds; Person will strive 


as to avoid 
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increasing it. In other words, Person will tend 
to restrict production. 

The predictions are tested in two experi- 
ments. The first experiment, which temporally 
preceded the second in the winter of 1960, is 
a test of the first prediction. The second, per- 
formed in late spring 1961, tests the two 
predictions simultaneously and, additionally, 
replicates the first experiment. 


EXPERIMENT [| 
Method 


Twenty-two male students at New York Univer- 
sity were hired through the Placement Service for 
part-time temporary work as interviewers. The rate 
of pay announced was $3.50 per hour and subjects 
(Ss) were given the impression that interviewing 
would take place for an extensive period of “several 
months.” When they reported to their prospective 
“employer” they were assigned randomly in equal 
numbers to the experimental dissonance condition 
(N = 11) and to the control condition (NV = 11), as 
described below. 

1. S filled out a Personal Information Question- 
naire requesting demographic, educational, and previ- 
ous employment data 

2. The experimenter (E£) studied S’s background 
in his presence and, depending on the condition to 
which S had been assigned, induced the perception 
that S was qualified for the job or that he was not. 

The differential inductions for experimental and 
control Ss were as follows: 

Experimental Ss: 

You don’t have any (nearly enough) experience 
in interviewing or survey work of the kind we’re 
engaged in here. I specifically asked the Placement 
Service to refer only people with that kind of ex- 
perience. This was the major qualification we set 
I can’t understand how such a slip-up could have 
occurred. It’s really very important for research of 
this kind to have people experienced in interview- 
ing and survey techniques. [Agonizing pause.] 

We're dealing with a limited alternative open 
end kind of questionnaire. There’s no “correct” an- 
swer to an item. Research in this area has shown 
that the nature of the response elicited by a skilled 
and experienced interviewer is more accurate and 
representative of the respondent’s sentiments and 
differs substantially from the responses elicited by 
inexperienced people 

Who interviewed you at Placement? [E scans 
the New York University phone directory, picks 
up telephone receiver and dials a number. Gets 
busy signal and slams receiver down. Pause, while 
E thumbs papers and meditates.] 

I guess I'll have to hire you anyway, but please 
pay close attention to the instructions I will give 
you. If anything I say seems complicated, don’t 
hesitate to ask for clarification. If it seems simple, 
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pay closer attention. Some of this stuff, on the 
surface, may appear to be deceptively easy. 

Since I’m going to hire you, I'll just have to 
pay you at the rate we advertised, that is, $3.50 
per hour. 

Control Ss: 

Well, this is very good. We can use you for this 
work. You meet all the qualifications required for 
the job, which is good, because we often have to 
turn people down because they’re poorly qualified. 
Poorly qualified people can really make a mess of 
a study of this kind. Why even the Census, where 
they were dealing with simple demographic ma- 
terial, got fouled up. They hired inadequately 
qualified people, some of their housewives for ex- 
ample, and the result was the gross deficiencies in 
their data that were so widely criticized in the 
press, if you recall. 

Well, anyway, I’m pleased you have the back- 
ground we’re looking for. 

So far as pay is concerned, the people at the 
Placement Service have probably advised you that 
we pay $3.50 per hour. This rate of pay is stand- 
ard for work of this kind performed by people 
with your qualifications 


Referring to the theoretical statements made ear- 
lier, it may be noted that in these instructions S is 
Person, while unnamed interviewers at large consti 
tute a generalized Other. S’s alleged qualifications are 
the inputs of Person and the pay of $3.50 per hour 
is his outcome. In the experimental induction S is 
made to perceive that his inputs are out of line with 
his outcomes and the inputs and outcomes of other 
interviewers. In the control induction, he is made to 
perceive that his inputs are in line with his out- 
comes and the inputs and outcomes of other inter- 
viewers. 

3. E gave S instructions on the interviewing task 
S was to interview adult members of the general 
public for approximately 2.5 hours and was to ob- 
tain approximately equal numbers of interviews with 
male and female respondents. No restrictions were 
placed on where S was to obtain interviews. The in- 
terview was a simple one, requiring respondents to 
associate one of five automobile marks with six brief 
personal descriptions, such as “A professional athlete” 
and “A rising junior executive.” 

The task met the requirements of the experiment 
nicely. It was brief enough so that many interviews 
could be obtained during S’s work period. More im 
portantly, it could be made to be perceived as re 
quiring more skills than S possessed or as requiring 
skills commensurate with his qualifications 

All Ss were given interviewing materials and a sup 
ply of 50 blank questionnaires (a number predeter- 
mined to be slightly greater than the most efficient 
interviewer could use in 2.5 hours). As S departed 
to begin work he was briefly reminded of the rela- 
tion of his qualifications to his pay of $3.50 per 
hour. 

4. When S returned, the time of his return having 
been specified by E (e.g., 4:00 p.m.), he was asked 





WoRKER PRODUCTIVITY AND COGNITIVE DISSONANCE 


TABLE 1 
PRODUCTIVITY AND MEDIAN DISTRIBUTION OF 
EXPERIMENTAL AND CONTROL SUBJECTS 


MEAN 


Experimental) Control 
Cases above median 


Cases below median 


Mean productivity 


to fill out a brief questionnaire to obtain informa- 
tion on his interviewing experience and his reactions 
to it. The experiment and its purpose were then ex 
plained to him and he was requested not to divulge 
information about it. In only one instance was this 
request not observed and the S affected was elimi- 
nated from the experiment. Finally, S was paid for 
his time at the full rate of $3.50 per hour. 


RESULTS 

Since there was some variation in the 
amount of time Ss worked—a range of 143- 
165 minutes—and the dependent variable of 
interest was productivity, the number of in- 
terviews obtained per minute was the datum 
used in analysis. The results are given in 
Table 1. As predicted, experimental Ss pro- 
duced significantly more than control Ss. By 
median test? a significance level better than 
.05 is achieved (x? = 4.55, df = 1). 


EXPERIMENT II 


In this experiment the hypothesis tested 
was that whereas Ss overpaid by the hour 
would show greater productivity than con- 
trols, Ss overpaid on a piece rate would show 
less productivity than controls. Thus, the 
most appropriate test would be one of the in- 
teraction between method of pay and dis- 
sonance. 


Method 
Thirty-six male New York University students 


hired through the Placement Service, as in 
the previous experiment, for the same interviewing 


were 


2In this and the following experiment nonpara- 
metric tests were used. The significance levels ob- 
tained are approximately the same as that obtained 
by ¢ tests and analysis of variance, after transform- 
ing productivity data to arc sines. The nonpara- 
metric tests, however, are the more appropriate since 
the bases of productivity data were not equal. All 
tests are two-tailed. 


TABLE 2 
MEAN PRODUCTIVITY AND MEDIAN DISTRIBUTION O1 
HouRLY AND PIECEWORK 


CONTROL 


EXPERIMENTAL ‘AND 
SUBJECT 


H. 


Cases above median 


Cases below median 


Mean productivity .2723 1493 


job and at identical or equivalent rates of pay 
When they reported to the “employer,” they were 
randomly assigned in equal numbers to the following 
four conditions 

1. He: experimental dissonance condition; Ss paid 
$3.50 per hour (N = 9) 

2. He: control condition; Ss paid $3.50 per 
(N= 9) 

3. P.: experimental dissonance condition; Ss paid 
30 cents per interview obtained (N = 9) 

4. P.: control condition; 
interview obtained (V = 9) 

The procedure was identical to that in 
ment I, 


hour 


Ss paid 30 cents per 
Experi 
with the exception that Ss in the P. and 
P. conditions were paid 30 cents per interview. The 
piece rate of 30 cents was determined from the 
performance of control Ss in Experiment I, who 
would have earned slightly less than 31 cents per 
interview, on the average, had they been paid on a 
piece rate. A minor, additional modification in pro 
cedure was that Ss were instructed to work 2 instead 
of 2.5 hours. 


RESULTS 


The data are presented in Table 2 and sup- 
port the hypothesis. As may be seen, hourly 


workers in the dissonance condition had a 
higher mean productivity than their controls, 
whereas pieceworkers in the dissonance con- 
dition had a lower mean productivity than 
their controls. Median tests of the difference 
between the two hourly conditions and _ be- 
tween the two piecework conditions do not 
reach a satisfactory level of significance, 
TABLE 3 
CuI SQUARE ANALYSIS OF VARIANCI 


Source 


Method of pay 
Dissonance 
A X B interaction 


*> <.05 


** p> <.01 
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although the directions of the differences are 
as predicted. A more powerful and appropri- 
ate test is that of the interaction between 
conditions of pay and dissonance. The inter- 
action, tested by chi square analysis of vari- 
ance, is significant at better than the .01 level 
(y? = 7.11, df = 1), as shown in Table 3. 
An unpredicted, probably artifactual, finding 
is the significantly lower productivity of 
pieceworkers (x? = 4.00, df = 1, p < .05). 
DIscUSSION 

In the experiments reported here we at- 
tempted to extend the implications of cogni- 
tive dissonance theory to a problem that is 
of pervasive concern to many institutions, and 
we have tried to test relevant derivations of 
the theory in such a manner that they bear 
directly on the problem. The problem selected 
was that of wage inequities and their effects 
on worker productivity. The experimental 
situation chosen was a “real” employment 
situation in which workers performed a real 
task for real wages. It may be noted that our 
Ss uniformly believed they had been hired 
for and had worked on a real job. In pretest 
experiments it was found that unless this con- 
dition obtained predictable effects were not 
observable. 

In the first experiment reported, it was 
shown that hourly workers made to feel over- 
paid display greater productivity than con- 
trols earning the same pay but made to feel 
fairly paid. Although the prediction was de- 


rived from cognitive dissonance theory, an 


alternative explanation is that the experi- 
mental Ss worked harder and produced more 
simply in order to obtain greater job security. 
Having been hired reluctantly by £, they 
may have feared losing their jobs, unless 
they could demonstrate their competence by 
working hard. This explanation is not satis- 
factory, however. In the first place, if it were 
valid, the same effects would be predicted for 
pieceworkers. But, as we have seen, piece- 
workers reduce rather than increase their pro- 
ductivity when overpaid. Secondly, in a re- 
lated study, Arrowood (1961) has argued 
that if the alternative explanation were valid, 
the results would obtain only when E 
aware of S’s productivity. He conducted an 
experiment in which hourly Ss worked under 
overpaid and equitably paid conditions, as in 


Was 
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our first study, and under public and private 
conditions. In the public condition Ss turned 
in their completed work to E; in the private 
condition they mailed their work to New 
York with the impression E would not see it. 
Arrowood found that overpaid workers pro- 
duced significantly more in both the public 
and the private conditions. His study, there- 
fore, supports the dissonance-derived predic- 
tion and invalidates the alternative explana- 
tion of our results. 

In the second of our experiments a further 
hypothesis was given support. If workers per- 
ceive they are overpaid—which psycholog- 
ically is an inequity, as is being underpaid— 
and other means of reducing the resulting dis- 
sonance are not readily available to them, 
they will produce more or less than controls 
paid the same wages, depending on whether 
their wages are hourly or piecework, respec- 
tively. A subsidiary finding is that piecework 
results in lower productivity than hourly 
work. While this is clearly contrary to the 
usual assumption that piece rates are an in- 
centive to higher productivity, and further 
research is being undertaken to determine 
the generality of the finding, it is in all likeli- 
hood an artifact of the experiment. In the 
first place, pieceworkers were predicted to 
produce less than hourly workers when over- 
paid. Secondly, because of the relatively high 
wages paid in Experiment II it is likely that 
some dissonance (some feeling of overpay- 
ment) was experienced by the control Ss in 
the H. and P. conditions, though less than by 
the experimental Ss in H, and P, conditions. 
As a result we would expect H, Ss to produce 
somewhat more and P,. Ss somewhat less than 
if no dissonance at all had been experienced, 
and it would follow that the mean difference 
in productivity between hourly workers and 
pieceworkers would be enhanced. 
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PERFORMANCE ON A FIVE-FINGER CHORD KEYBOARD 
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Discrimination Reaction Times (DRTs) were examined for each of the 31 
chords possible with a 5-finger chord keyboard. Each of 4 paid Ss showed 
marked improvement in performance over 4000 to 11,000 DRTs on the 31- 
chord task despite extensive prior practice on subsets of the 31 chords. DRTs 
between .30 and .35 sec. were obtained for the average of all 31 chords. Infor- 
mation transmitted from stimulus to keyboard was greater than 4.25 bits. In 
terms of average DRTs, the relative difficulties of the 31 chords were very simi- 
lar for the 4 Ss. The percentage of error for each of the chords was highly 
correlated with the DRTs 


A recent report by Ratz and Ritchie fingers each positioned over the four remaining ex 
(1961) describes performance on a “chord Posed keys (the keyboard was a modified IBM 
kevl rd” in t f ti as f tl numeric keyboard with four keys and thumb bar 

*y boar rms ac 2S > 
ee 9 m CESS OF HK “ _ _— os om operative, and all the rest covered and blocked) 
individual chords, and information trans-  pjccrimination Reaction Time (DRT) was measured 
mission rates for several different keyboards from onset of a stimulus pattern to the completion of 
differing in number of, and specific, chords the depression of (all of) the corresponding key(s) 
involved. They report “that little improve- If the response was correct the stimulus lights went 

Ll . fter th i it tee out when the electronic timer stopped, the DRT 
- nt too pat alter the second day. . hae, was automatically punched into an IBM card, the 
totic behavior was reached rather quickly timer reset, the tape reader advanced, the next 
because the order of the presentations was timulus pattern punched into the next card, and 
random.” Each day’s practice lasted 10 min- the next pattern of stimulus lights turned on. Thi 
utes. It is the intent of the present report to interval between completion of a correct respons 

s : j Suet : ; and the onset of a new stimulus pattern was be 

ry the > . ‘ > 7 > aSK Ss > > . . , } 
show that performance on this type of task i senenee 2 Gil 3 taceia Git diliincstiiy tetendineed 
very far from asymptotic level after just 20 variability to discourage rhythmic behavior. If th 
minutes of practice, and that skilled subjects subject made an error the automatic sequence halted 

a ¢ . . r > > - IIs \ > rror 
exhibit much faster average reaction times @4 the experimenter a eee — 4 

. * ¢ . “es intormation into the Care rrors were ol! three 
and much higher information transmission 
rates than do the relatively untrained sub- depress all required keys within .1 second of e: 
jects used by Ratz and Ritchie. In contrast to _ other, or a combination of the first and second 
the information transmission rate of 4.1 bits Four undergraduate college students were hired 


per second reported by Ratz and Ritchie a ‘'° Setve as subject Each served in other reactior 
time experiments (with the same apparatus) prior 


kinds: one or more extra keys depressed, failure 


rate as high as 11 bits per second of actual 
transmitted information has already been ob- perimental session, taking approximately 30 minutes, 
tained in a study by Klemmer and Muller each of the 31 (25-1) patterns possible with the 


to the one reported here. During each daily ex 


(1953). 5 lights (and keys) was presented 17 times. Four 
PROCEDURI different random sequences of the resulting 527 pat 

terns were used, the same sequence of patterns being 

In the present experiment the testing situation was repeated every fourth session as many times as 
similar to that used by Ratz and Ritchie. A paper needed. In order to stabilize the estimates of average 
tape reader provided random sequences of pat- DRTs those associated with errors are not included 
terns of stimulus lights. The stimulus lights were in the analyses, nor are those that were more than 
arranged to correspond closely to the arrangement three standard deviation units away from the 
of the response keys; four lights (and keys) were mean of the remaining distribution. The percentage 
in a horizontal row, and a fifth light (and key) of responses in error for each subject for each of 
was located below the left-most member of the the last 10 sessions was between 5.1% and 14.0% 
row. The entire stimulus array was positioned in For each of the same sessions, for each subject, thi 
front of the subject so as to fall within a visual number of DRTs falling outside of the three stand 
angle of 5 degrees or less. The keys were operated ard deviation range was between .2% and 4.4% 
by the right (favored) hand with the thumb resting of the number of DRTs for correct responses (over- 
on the single key (thumb bar) and the remaining all was 1.64%). For 32 of the 40 session means the 
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elimination of the extreme scores yielded shifts of 
< .002 second (maximum shift was .004 second). 


RESULTS 


Figure 1 plots performance as a function 
of practice for each of the four subjects. 
Breaks in the curves indicate points at which 
other DRT conditions intervened for a mini- 
mum of 58 sessions. Beyond the first practice 
session on all 31 patterns the average DRT 
for all subjects is less than .40 second for 
every session. For all subjects there is an 
obvious decrease in average DRT with prac- 
tice for at least the first eight sessions (ap- 
proximately 4,000 DRTs). Subject J had 
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previously participated in an experiment in- 
volving all 31 possible patterns, but the other 
three subjects, while experienced with various 
subsets of the 31 patterns, had never prac- 
ticed in a situation involving all 31. For these 
three subjects average DRT continues to de- 
crease over a still greater number of practice 
sessions. 

The information transmitted (T) from 
stimulus to keyboard (a function of number 
and distribution of errors) is also depicted in 
Figure 1. For two of the subjects this function 
shows an increase over the first three or four 
practice sessions, but for the other two sub- 
jects no such shift is observed. Beyond these 
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early practice sessions there is no consistent 
shift in the value of T. 

Thus, while DRT decreases for many prac- 
tice sessions, T does not show a corresponding 
decrease. In other words, faster DRTs are not 
being achieved at the expense of higher error 
rates, and the decreases in DRTs associated 
with practice are reflecting improved per- 
formance in terms of T/DRT. For Subject R 
there is a marked downward shift in both T 
and DRT for Sessions 22 through 31, but 
here the decrease in DRT more than offsets 
the decrease in T, with the resulting per- 
formance in terms of T/DRT showing still 
further improvement for these last 10 sessions. 
Values for T/DRT exceed 13 bits per second 
for all four subjects, and for Subject C they 
reach 15. These values are interpreted in the 
discussion section. 

The data of the last 10 (two 
blocks of five) for each of the subjects were 
subjected to further analyses in terms of the 
31 specific patterns involved. For all subjects 
except Subject C, these last 10 sessions were 
preceded (though not immediately) by at 


sessions 


least 20 identical sessions (14 for Subject C) ; 
and each of the four subjects had provided 


at least 50,000 prior DRTs under various 
experimental conditions. For each subject the 
DRTs obtained for each of the 31 patterns 
during the first of the last two blocks of five 
sessions were correlated with those for the 
second block. The correlations for the four 
subjects are: R, .974; C, .986; J, .989; and 
N, .987, indicating little shift in the rela- 
tive difficulty of the patterns for each of 
the subjects. The average DRTs for each of 
the 31 patterns, for the four subjects com- 
bined, were then computed for the last blocks 
of five sessions. Correlations were computed 
between these average values and the cor- 
responding values for the four individual sub- 
jects. The correlations are: R, .942; C, .973; 
J, .960; and N, .924, indicating that the 
relative difficulties of the 31 patterns are 
quite similar for all four subjects. However, 
the slopes of the regression lines for the four 
subjects range from .706 to 1.168, indicating 
that some subjects show a greater difference 
between difficult and easy patterns than do 
others. For the four subjects combined, the 


longest DRT was 1.11 times the median 


f 
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value, and the shortest was .89 times the 
median value. The average DRTs for the 31 
chords are listed in Table 1, ordered in terms 
of the average DRTs actually obtained for 
the last block of five sessions. However, the 
population of subjects was too small to attach 
any significance to the precise rankings of 
patterns which are close to each other in the 
list. For example, Pattern 4 yielded the fastest 
DRTs for all four subjects combined, but this 
was due primarily to the performance of Sub- 
ject C. Performance for the three other sub- 
jects indicates Pattern 3 as the fastest. 
TABLE 1 


AND PERCENTS OF ERROR FOR EACH 
OF 


AVERAGE DRTs 
THE 31 PATTERNS 


DRT 
Milli 


seconds 


Pattern 
1°2 34 58 


Error 
P , 
ercent 


ons 
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Note.—Data obtained fr 
wr each subject 
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An examination of errors for individual 
chords during the last block of five sessions, 
for all subjects combined, reveals a range of 
1.8% for the chord with least errors to 25.9% 
for the chord with the most. The overall error 
rate was 9.9%. The correlation between aver- 
age DRT and percent error for the 31 chords 
was .836, indicating a strong tendency for 
chords with long DRTs to also have many 
errors, and vice versa. The error rates for the 
31 patterns are also listed in Table 1. 

DIscussION 

Despite many similarities there is one 
major difference between the procedure of the 
present experiment and that reported by Ratz 
and Ritchie (1961). For the present experi- 
ment the subject stopped the DRT clock 
when he depressed the correct combination of 
keys, following which he had from 2 to 3 
seconds before being called on for another 
DRT. When the keys were released had no 
effect on the timing. Thus, the response-to- 
stimulus interval was machine determined. 
For their experiment, the subject “stopped 
the clock” when he depressed all appropriate 
keys, but the interval between this response 
and the next stimulus was under the control 
of the subject, since the next stimulus did not 
occur until .1 second after all keys were re- 
leased. No mention is made of the magni- 
tudes of this “waiting time” during which the 
subject kept his hands on the keys. Nor is 
any mention made of whether the subject was 
DRT or 
average time between responses. It is reason- 
able to assume that the different instructions 
would lead to different overall levels of DRTs, 
with instructions to minimize DRTs leading 
to a situation similar to that of the 
present experiment. However, it does not seem 
reasonable to expect that, if the instructions 
were to minimize time between responses, the 


instructed to minimize to minimize 


very 


DRTs would be as much as three times larger 
than if instructions were to minimzie DRTs. 
Nor does it seem reasonable to expect that the 
difference in instructions would lead to con- 
tinued 
jee ts) for 


improvement (for experienced sub- 
4,000 to 11,000 DRTs in the 
present experiment, while producing no fur- 
ther improvement (for naive subjects) after 
20 minutes (presumably fewer than 1,000 


SEIBEL 


DRTs—the number is not reported) of prac- 
tice in the Ratz and Ritchie experiment. 

In further support of the argument that the 
Ratz and Ritchie subjects were far from 
“asymptotic behavior” a report by Klemmer 
and Muller (1953) is cited. Klemmer and 
Muller ran three highly trained subjects un- 
der conditions very similar to those reported 
by Ratz and Ritchie for all chords of one 
hand, with the exception that a new light pat- 
tern appeared .02 second after a response was 
made and no “wait” time was involved. Klem- 
mer and Muller report an average rate of ap- 
proximately 2.5 responses per second, which 
translates to an average “reaction time” of .4 
second. Thus, even if the Ratz and Ritchie 
subjects were under instructions to minimize 
time between responses their reaction times 
were three times as long as those reported by 
Klemmer and Muller for subjects run under 
the same instructions. 

The absolute levels of average DRTs in the 
present experiment are less than 14 the values 
reported by Ratz and Ritchie, but the order- 
ing of the 31 specific chords according to 
DRT is very similar in both experiments, the 
rank-order correlation being .896. In com- 
paring individual chord DRTs with the 
median DRT for the 31 chords Ratz and 
Ritchie (1961) report that the smallest of 
the 31 DRTs was approximately .88 (esti- 
mated from their Figure 3) times the value 
for the median DRT. This agrees well with 
the ratio of .89 obtained in the present ex- 
periment. However, they report a ratio of ap- 
proximately 1.27 (also from their Figure 3) 
for the largest DRT, while the corresponding 
ratio in the present experiment is only 1.11. 
In terms of their “Application of coding the- 
ory” this indicates that the “optimum fre- 
quency distribution” for the 31 chords should 
show less difference in frequency for the fast- 
est versus the slowest chords. With their op- 
timum distribution they indicate that “The 
improvement to be expected is of the or- 
der of 5% in the net information rate ‘i 


Utilizing the present findings of smaller ratios 
the percentage gain would be still smaller. It 
appears that with additional practice, how- 
ever, a gain in net information rate between 


200% and 300% might have been achieved 
They report 4.1 bits per second. Klemmer and 
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Muller (1953) report 11 bits per second in 
an almost identical situation. The results of 
the present experiment, ignoring the time be- 
tween the completion of response and onset of 
next stimulus, can only suggest upper limits 
for the continuous transcription situation, 
the values being between 13 and 15 bits per 
second. The magnitude of the time required 
to produce one chord pattern immediately 
after another versus the gain in rate of re- 
sponding made possible by being able to 
“look-ahead” must be known before values 
derived from DRTs can be applied to pre- 
cisely 


predict performance in a continuous 
transcription situation 

Though the lack of practice of the Ratz 
and Ritchie subjects appears to have pro- 


duced a threefold (or more) overestimate of 
reaction times for the 31-chord keyboard, this 
the 
overestimate for the ‘‘all chords, 


factor of three is small compared to 
amount of 
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two hands” keyboard. On the basis of a six- 
minute test they report an average of 2.63 
seconds per response for the “961 allowable 
10-finger patterns.” Data currently being col- 
lected for a similar situation (1,023 possible 
patterns) continue to show improvement after 
as many as 50,000 responses, with 
DRTs consistently below .5 
the end of practice. These DRTs suggest in- 


average 
second towards 
formation transmission rates approaching 
bits per second. 
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A VOCATIONAL INTEREST SCALE FOR BIOLOGISTS 


LOUIS M. HERMAN,! CARL A. LINDSAY, anp MARTIN L. ZEIGLER 


Student Affairs Research, Pennsylvania State University 

A Biologist scale for the SVIB (Form M) was developed following procedures 
outlined by E. K. Strong, Jr. 4 groups were employed: (a) criterion (N = 251) 
systematically selected from Volume II of Men of Science, (b) 
cross-validation (N= 89) selected at an Institute of Biological 
convention, (c) 2 concurrent validation groups (Ns=121, 306) 
selected from the Pennsylvania State University student Results in- 
dicated that the scale differentiated the interests of: (a) the biologists from 
Strong’s men-in-general group (P:), (b) the biologists from the interests 
measured by 36 other SVIB scales, (c) the concurrent validation groups in 
the expected direction. Reliabilities of .88 (criterion) and .87 (cross-validation) 


American 
American 
Sciences 


body. 


were 


obtained. It was concluded that 


the scale has sufficient validity and 


reliability to be a useful counseling device. 


The work of E. K. Strong, Jr. on the meas- 
urement of adult interests has been consider- 
ably extended since his publication of the 
scales for the revised Vocational Interest Blank 
for Men (SVIB) in 1938. Several additional 
scales have been developed and some original 
scales revised (e.g., Kreidt, 1949; Strong, 
1945, 1946, 1949: Strong & Tucker, 1952). 
A scale for the biological sciences, however, 


has not been developed previously, possibly 


because of the difficulty in defining and 
delimiting the career area of biological sci- 
ence. For example, at the present time there 
are 24 membership societies and 24 affiliated 
societies comprising the American Institute of 
Biological Sciences (AIBS). 

At the Pennsylvania State University, 
which employs the SVIB as part of the test 
battery routinely administered to entering 
freshmen, presumably related scales (Physi- 
cian, Mathematician, Experimental Psychol- 
ogist, Artist, and Dentist) had been used as 
a basis for judging interest in the biological 
sciences area. 

The need for a separate Biologist scale was 
indicated by (a) the lack of empirical evi- 
dence to support the assumption of “related 
scales” and (5) a relatively large dropout rate 
among students in the biological science area, 
many of whom appear to lack sufficient moti- 
vation or interest in this area. 

It was recognized that such a scale would 

1Now with Advanced Systems Research Group, 


North American Aviation, Incorporated, Columbus, 
Ohio. 


have to limit its scope to certain types of 
biologists. It was decided arbitrarily to de- 
velop a scale based upon biologists primarily 
working in teaching or research areas. This 
choice eliminated those biologists engaged in 
technician-type work or in applied services, 
and placed emphasis on 
vanced-degree biologists. 


professional, ad- 


METHOD 
Criterion Sample 


500 biologists 
30,000 


During 1958, a systematic sample of 
was drawn from the approximately 
listed in the American Men of Science, Volume II, 
The Biological Sciences (Cattell, 1955). Restrictions 
in sampling were exclusion of females, persons 64 
or over, and those whose only advanced degree was 
the MD. The latter restriction was an attempt to 
provide for differentiation from the existing Physi- 
cian scale. According to Cattell,- criteria employed 
in considering the 
this volume 


names 


listing of an individual within 
included the amount of original re 
search published, an academic position, membership 
in scientific extracurricular positions of 
importance, and medals or honors of various kinds 
The sample, therefore, mainly consisted of biologists 
interested in research or teaching within 
setting 


societies, 


an academic 


Each member of the sample was mailed an ex- 
planatory letter and a copy of the SVIB. Two 
follow-up letters were used. Substitute names were 
used for the approximately 5% of the 
deceased or letters 
“address unknown.” A total of 251 usable SVIBs 
were obtained through these procedures, or ap- 
proximately 50% of the sample. The mean age of 
this group (called the criterion 


sample who 
were returned 


were whose 


group) was 


2 J. Cattell, personal communication, 1959 
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TABLE 1 


DISTRIBUTION OF BIOLOGICAI 


SCIENCES SPECIALTIES FOR THE 


CRITERION AND CROSS-VALIDATION GROUPS 


Criterion group 


Specialty 


Agronomy 

Anatomy 

Bacteriology 

sotany 

Ecology 

Genetics 

I hthyology and Her} etology 
Microbiology 

Mycology 


Parasitology 


w 


Plant Pathology 


_ 


Souw 


Plant Physiology 
Protozoology 
Verterbrate Paleontology 


Zoology 


_ 


to 
uN 
—_ 


years, with an SD of 8.6. Broad classification of the 
occupations of this group revealed that 54% held 
teaching positions, 349% were in research, 10% in 
administration, and 2% in miscellaneous positions. 

Standard procedures (Strong, 1943) were fol- 
lowed in calculating the scoring weights to be used 
with the Biologist scale for each of the three re- 
sponse alternatives for the 400 items of the SVIB. 
The Biologist scale is being published by the Stan- 
ford Press and the scoring weights for the scale 
may be obtained from the Press. 


Cross-Validation Sample 


A sample of 350 biologists attending the 1959 
convention of the AIBS at the Pennsylvania State 
University were asked to complete the SVIB.2 The 
same sample restrictions were employed for the 
cross-validation group as had been used for select- 
ing the criterion group; an additional restriction, 
however, was that only biologists would be sampled 
from 11 of the 16 societies of the AIBS attending 
the convention. These 11 societies were judged to be 
most representative of biologists pursuing research 
in an academic setting. From this total sample, 89 


>It had been planned originally to saraple 754 
biologists, but due to the difficulty of distributing 
the SVIBs to the selected biologists within the short 
span of the convention, this number reduced 
considerably. 

* Acknowledgment is made to L. R. Kneebone of 
the Division of Biological Sciences at Pennsylvania 
State University for his assistance in selecting the 


ippror wriate societies. 


was 


Number 


_~ 


Cross-validation group 


Percent Number Percent 
6.0 0 0.0 
1.6 0 0.0 
3.6 2.2 
15.1 14.6 
0.4 1.1 
6.8 
0.0 
2.2 
0.0 
3.4 
16.9 


g 6.8 
2.0 7? 
0.0 1 
40.0 38 42 


Al 
7 
100.0 


usable blanks were returned. These were scored using 
the item weights previously derived from the crite- 
rion group; they also were scored (as was the 
criterion group) on an additional 36 occupational 
scales used at the Pennsylvania State University.5 
The mean age of the cross-validation group was 
39.4 years, with an SD of 10.0. Broad classifications 
of their occupations revealed that 56% held teaching 
positions, 34% were in research, and 10% in admin- 
istration. A comparison of the names of the two 
samples of biologists revealed that there was no 
overlapping between the two groups. Table 1 com- 
pares the biological science specialities of the criterion 
and cross-validation groups. For the criterion group, 
these specialties were determined from the listing in 
American Men of Science; for the cross-validation 
group, specialties were determined from a_ short 
questionnaire completed together with the SVIB. 


RESULTS AND DISCUSSION 
Criterion versus Cross-Validation Group 


Table 2 compares means and letter-grade 
distributions of the criterion and cross-valida- 
tion groups on the Biologist scale. The differ- 
ence between the means or among the letter- 
grade distributions was not significant (CR 
= #2; x" 2.38). 

The correlation between the Biologist scale 
and each of the remaining 36 scales was 
calculated for each of the two groups. For 


5 The scales used are given in Table 3 
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TABLE 2 


AND LETTER—GRADE DISTRIBUTION 
ON THE BIOLOGIST SCALE OF CRITERION AND CROSS 


MEAN Raw ScorI 


VALIDATION GROUPS 


Cross 
validation 
group 


Criterion 
group 
(N = 251 


161.49 
SD 60.34 


Percent letter-grades 


Mean raw score 


70.9 
13.1 
10.0 

6.0 


Note Letter-grades are half SD units; A includes ¢ 
half SD below the mean of the cross-validation group a 
B* is from one SD to one-half SD below the mean, et 


both groups, the Physician scale correlated 
highest (.85 for the criterion and .89 for the 
cross-validation group); Real Estate Sales 
correlated lowest for both groups (— .73 and 

.71, respectively). The only significant dif- 
ference found between the scale correlations 
for the two groups was for the Musician scale 
(cross-validation group higher, CR = 2.16). 
It is apparent that the two groups correspond 
very closely. 


Reliability 


The odd-even reliability of the, Biologist 
scale, using the criterion group was .79, cor- 
rected by the Spearman-Brown formula to 
.88; for the cross-validation group the reli- 
ability was .77, corrected to .89. These cor- 
relations correspond approximately to the 
magnitudes of the reliabilities found for other 
interest scales of the SVIB. 


Differentiation among Scales 


Two types of scale differentiation were in- 
vestigated: (a) the extent to which the Bi- 
ologist differentiated interests of the 
cross-validation group from 36 other interest 
scales of the Strong and (6) the extent to 
which the Biologist differentiated in- 
terests of the cross-validation from 
men-in-general. 

Table 3 


scale 


scale 


group 


shows the mean raw scores ob- 


tained by the cross-validation group on the 


Biologist scale and on each of the remaining 


36 scales. Critical ratio revealed that 


tests 


ZEIGLER 
the mean raw score of the cross-validation 
group on the Biologist scale was significantly 
greater than the mean scores obtained by 
that group on any of the remaining 
scales. It that the Biologist scale 
weights interests differently than do the re- 
maining scales. 


same 


appears 


TABLE 3 
MEAN AND STANDARD DEVIATIONS OF RAW SCORES 0 
37 OCCUPATIONAL SCALES FOR THE CROS 


VALIDATION GROUP 


Scale V 


164.40 
90.93 


siologist 
Physician 
Dentist 37.67 
Vete 
Clinical Psychologist* 19.39 
Architect 54.52 
Artist 43 
Chemist 

Engineer 


rinarian 15.31 


Mathematician 

Experimental Psychologist* 
Farmer 

Carpenter 

Forest Serviceman 

Math Physic al Science Teacher 
Social Science Teacher 
YMCA Phy sical Director 
Personnel Manager 
Public Administrator 
YMCA Secretary 
Guidance Counselor 
Minister 

\ccountant 

Office Worker 
Purchasing Agent 

Banker 


Industrial Relations* 


wt um: 


Pharmacist 


Sales Manager 


»,N NWN WN bh Ft 


— me hd 


Life Insurance Salesman 
Real Estate Salesman 
Advertising 
Author-Journalist 
Lawyer 

Production Manager 
Musician 

Army Officer 
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TABLE 4 


MEAN Raw ScorRE AND LETTER—GRADE 


DISTRIBUTION 
ON THE BIOLOGIST SCALE OF STUDENTS IN BIOLOGICAL 
SCIENCE 


AND NON-BIOLOGICAI 


SCIENCE CURRICULA 


Non bio 
Biological logical 


science 


(N = 121 


science 


(N = 306 


Mean raw score 
SD 


Percent letter-grades 


45.85 


69.93 


2.9 


r-grades 


ur 


he ean ol r 
SD to one f SD bel 


Strong’s P, men-in-general would 
have achieved a mean raw score of only 

15.51 if scored on the Biologist scale. Al- 
though it 


group 


was not possible to derive pre- 
cisely the standard deviation of this men-in- 
general group the the mean dif- 
ference appears sufficient to conclude that the 
differentiates between | bi- 
ologists and this men-in-general population. 


on scale, 


scale very well 


Differentiation among Groups 


A post hoc investigation was made of the 
degree to which the Biologist scale could have 
differentiated between the interests of fresh- 
men who later majored in the biological sci- 
ences and those who did not. During 1960, a 
total of 121 SVIBs completed 2-3 years 
previously, by current upperclass male stu- 
dents (juniors and seniors) majoring in bi- 
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ological science curricula, were rescored on the 
developed Biologist scale. These scores then 
were compared with similarly obtained scores 
of a random sample of 306 male upperclass 
students (of a total of approximately 4,300 
such students) majoring in curricula other 
than biological science. Table 4 shows the 
mean raw scores and letter-grade distributions 
of these two groups on the Biblogist scale. 
The mean raw score of the biological science 
group is significantly greater than that of the 
comparison group (CR = 4.34). Letter-grade 
distributions also are significantly different 
(x? = 61.65). It is concluded that 
would be a useful tool 
making recommendations 


the scale 
for counselors in 
for the biological 
science area; in addition, it is likely that the 
scale has validity as a predictor of later choice 
of the occupation of biologist. 


Normative 


Table A presents the normative data for 
the Biologist scale. Shown 
converted standard 
grade equivalents 


Data 


are 
and 
Percentile 


raw 
the 
ranks 


scores, 
letter- 
for the 


scores, 


criterion and cross-validation groups, and for 


the two undergraduate groups, also are given. 
The standard scores and corresponding letter- 
grades are upon 
The illustrates 


cross-validation 
the 


based the 


group. table close cor- 


6 Table A showing the normative data for the 
Biologist scale has been deposited with the American 
Documentation Institute. Order Document No. 7125 
from the ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress; Washing- 
ton 25, D. C., remitting in advance $1.25 for micro- 
film or $1.25 for photocopies. Make checks payable 
to: Chief, Photoduplication Service, Library of Con- 


gress 


TABLE 5 


LETTER-GRADE EQUIVALENTS FOR RAW 


OF THE CROssS 


Letter-grade Raw sco 


137 and 
110 


&3 


136 


28 and below 


AND STANDARD SCORE 
VALIDATION GROUP 


INTERVALS: THI 


RATING 


PERCENTAG 


RECEIVING EACH 


Percent cross 
validation 
Standard score group so rated 
45 and up 
44 
39 


34 
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respondence between the criterion and cross- 
validation groups and the differentiation be- 
tween the two undergraduate groups. 

Table 5 shows the letter-grades and cor- 
responding raw and standard score intervals. 
These intervals should be used when as- 
signing letter-grades to an individual for the 
Biologist scale. The table also indicates the 
percentage of the cross-validation group re- 
ceiving the indicated ratings. 
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AIRCRAFT CONSPICUITY AND FLIGHT ATTITUDE 
INFORMATION PROVIDED BY EXTERIOR 
PAINT PATTERNS'* 


KENNETH G. COOK, RICHARD M. BEAZLEY, anp JOHN E. ROBINSON, Jr. 


Applied Psychology Corporation 


Experiments were conducted to determine the relative conspicuity of air- 
craft exterior paint patterns, and to investigate whether such paint patterns 
aided pilots in determining the attitude of the aircraft. The conspicuity studies, 
using paired comparisons of model airplanes, gave evidence that: (a) amount 
of red-orange fluorescent paint coverage is positively correlated with con- 
spicuity; (b) high brightness paints should be placed on the upper surfaces 
of the aircraft and low brightness paints on the lower portions; (c) maximizing 
brightness contrasts between different parts of the aircraft surfaces does not 
enhance conspicuity; (d) flight attitudes, backgrounds, lighting conditions, 
and differences in Ss did not affect conspicuity significantly. The attitude 
studies in which pilots matched the model airplanes in some 1 of 15 attitudes, 
with 1 of 15 models mounted on a small display, indicated that the paint 
patterns used did not aid the pilots in making judgments of aircraft attitude. 
Differences in backgrounds and lighting conditions did not greatly affect Ss’ 


ability to determine attitude. 


Of the approximately 40 mid-air collisions 
occurring in the United States each year, 
more than 90% occur during daylight hours, 
and under weather conditions good enough to 
permit flight under Visual Flight Rules. 

In recent years, much research has been 
directed toward the use of fluorescent paints 
as a means of making aircraft more con- 
spicuous, thus helping pilots to avoid col- 
lisions. Various investigations have centered 
about these determinations: (a) efficacy of 
fluorescent paint, per se; (6) particular paint 
patterns which will provide best conspicuity 
against the widest variety of backgrounds, 
colors, textures, and brightnesses encountered 
in flight; and (c) paint patterns which will 
give a visual indication of flight attitude 
which might be useful in deciding on maneu- 
vers to avoid collisions. 

Very little adequately controlled research to 


1 These experiments were conducted as part of a 
large-scale research effort on visual mid-air collision 
avoidance, sponsored by the Federal Aviation Agency 
under Contract FAA/BRD-127. 

The authors wish to express appreciation to H. 
Richard van Saun of the Federal Aviation Agency, 
who was monitor for the contract at the time these 
studies took place, and to Robert B. Sleight, Ap- 
plied Psychology Corporation, for their aid in this 
series of studies. 


obtain quantified results has been done on 
aircraft paint patterning. This is particularly 
true of studies comparing patterns on iden- 
tical aircraft under controlled illumination 
conditions against identical backgrounds when 
the aircraft attitudes presented to the ob- 
server were identical. One exception to this is 
the study of Wagner and Blasdel (1948; sum- 
marized in Lazo & Bosee, 1961) who tested 
37 aircraft paint patterns in eight positions 
against three backgrounds. On the most 
visible pattern, the rear half of the wing, the 
elevator surfaces, and the rudder were painted 
glossy sea-blue. This study antedated the use 
of fluorescent paints, which were hence not 
evaluated. Further, the patterns tested were 
aimed at finding specific patterns rather than 
comparing conceptual schemes into which 
specific patterns could be devised. 

The possibility of using visual coding to 
indicate flight attitude deserves investigation, 
since attitude is one of the two important 
cues which pilots indicate they use to evaluate 
collision probabilities.* If the attitude (or 
aspect) of an aircraft can be accurately per- 


2The other is apparent relative motion. If no 
relative motion is apparent, then there is a high 
probability of colliding with the other aircraft, un- 
less one or the other changes course or airspeed. 
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ceived, then some judgment can be made of 
the direction in which it is moving.® 


PURPOSE 

Two groups of five experiments each were 
carried out in the laboratory, the first to ex- 
amine the usefulness of a number of paint 
patterns in enhancing an aircraft’s con- 
spicuity, the second, to test the effectiveness 
of paint patterns in improving pilot judg- 
ments of the flight attitudes of proximate 
aircraft. 

Both groups of experiments used the same 
paint patterns, which were based on present 
practice or on concepts suggested in aviation 
literature. They were: (a) Officially Specified 
Patterns: Paint patterns which at the time of 
the experiment were being used by the Air 
Force, Navy, Coast Guard, and Federal Avia- 
tion Agency; (6) External Contrast: Maxi- 
mized color and brightness contrast between 
the aircraft and its predominant environ- 
ments; (c) Internal Contrast: Differential 
brightness contrast between different parts of 
the aircraft surface; (d) Point Coding: Incre- 
ments in the number of painted extremities 
(tail, nose, wingtips) of the aircraft; and (e) 
Sector Coding: Color-coding of the four pri- 
mary aspects of the aircraft (front, rear, left, 
right). 

The official patterns in Experiment 1 were 
evaluated because they are representative of 
present operational patterns. The patterns 
used in Experiments 2 and 3 were designed 
to test hypotheses thought to have a bearing 
on increasing conspicuity, while in Experi- 
ments 4 and 5 an attempt was made to en- 
hance information about aspect. While the 
patterns devised on different bases 
(operational, conspicuity, aspect), all of them 
were evaluated in the first group of experi- 


were 


ments only in terms of conspicuity, and in the 


second group, only as aids to improved judg- 
ments about flight attitudes. 


METHOD 


Subjects. In 
from 


each experiment, 5 


a total number of 24, 


pilots, drawn 
served as subjects (Ss) 
Flight attitude is defined as the 


the position of th 
three 


specification of 
with reference to its 
pitch, roll, yaw. Air 
craft “aspect” is the particular view of the aircraft 
from a given observing position in space 


aircraft 


axes of rotation, ie., 
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Of the total, 15 
co-pilots ; 


were commercial airline pilots or 
4 were military pilots; 3, civilian pilots; 
retired military pilots. All but 1 S had far 
visual acuity of 20/20 or better. The exception had 
far visual acuity of 20/30. Depth perception, color 
vision, and lateral and vertical phorias were within 
normal limits for all Ss. 

Experimental design. For both groups of studies, 
a graeco-latin square analysis of variance design 
was used. For the conspicuity studies, there were 
each of flight attitudes, backgrounds, 
lighting conditions, and Ss. Inferences concerning the 
effects of paint patterns were made by ¢ tests of the 
differences between means of each pattern. For the 
attitude experiments, flight attitudes became the 
dependent variable, hence the paint patterns could 
be included in the basic graeco-latin square 


and 2, 


five levels 


Stimulus materials 
aircraft 


Plastic model reproductions of 
used in all studies. The “officially 
specified” patterns were applied to models of the 
Convair 440, having a wingspan of approximately 
2 inches, representing a scale of 1/520. Lockheed 
Electra models were used in all other studies. The 
Electra models had a wingspan of 254 
resenting a scale of 1/456 

Six colors were used in the patterns: aluminum, 
red-orange fluorescent, green fluorescent, black, white, 
and the natural gray of the plastic Where 
fluorescent paints used, the model was first 
painted aluminum. In accordance with manufacturers’ 
instructions, a white undercoating was applied to 
which were to receive the fluorescent paint 
The clear coat normally used over fluorescent paint 
to increase durability in sunlight was not needed in 
a laboratory study. 

Apparatus. The apparatus for displaying the 
models consisted of a tubular aluminum framework 
which supported lights, a translucent screen on 
which the background slides were projected, frames 
in which the models could be 
rheostats to control lighting. 


were 


inches, rep- 


model 
were 


areas 


supported, and 


Lights above and below the model position simu- 
lated sunlight, skylight (sunlight redirected by the 
sky), and ground light (sun and skylight reflected 
from the ground). A 35-millimeter slide projector 
was used to project the background slides onto the 
rear of the translucent 
controlled by 
input to the 
five backg 
following 
textured 
slight 
high 


screen; brightness was 
increasing or decreasing the voltage 
projector lamp. In each experiment, 


from the 
(b) cloudy sky, 


ound slides were used, chosen 
(a) cloudless blue sky; 
white to gray; (c) calm sea with 
rippling texture; (d) dense pine forest; (e) 
altitude scene of desert mountains; (f) snow 
area; (g) textured brown earth; (4) highly textured 


green grass 


very 


models 
invisible 


The scale were mounted on pairs of al- 
(.0035-inch 
vertically on a 
Irame¢ The 
wires permitted 36( 


most which 
tubular 
fittings holding the vertical 
rotation to vary the aircraft 
heading, and could be turned up or down 45 


aircraft 


wires diameter) 


were strung rectangular 


aluminum 


from 


horizontal to changs pitch. A small fixture 
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on the vertical wires allowed the model to be placed 
level or in a 45° left or right bank. In all, 72 atti- 
tudes were possible; of these, 5 were selected for the 
conspicuity climbing, 
turning, 


broadside, 
and 15 


(head-on, 
descending views), 
lected for the attitude studies 
Draperies completely the apparatus (ex 
cept the displayed models and screen) the Ss 
A rectangular opening 40 inches wide 30 inches 
high permitted a view of each presentation (pairs of 


studies 
and were se- 
screened 
irom 
by 
models on the conspicuity experiments, single models 
in the attitude The duration of ex- 
posure was controlled by the experimenter who op- 
erated a spring-tension window shade mounted in the 
rectangular opening 
Simulated sunlight, 
ground illumination 


experiments ) 


sky-ground light, and back 
levels were controlled by the 
experimenter from a console in front of the appa- 
ratus. Five combinations of light 
These varied 


For the conspicuity 


settings used 
slightly the 
experiments, brightness 
at the model positions ranged from 


were 
in each experiment 
studies 


across 


2.2 to 83.0 foot- 
lamberts, and background illuminations ranged from 
4 foot-lambert for the its lowest 
illumination to 37.0 foot-lamberts for the cloudy sky 
at its highest level. For the attitude 
brightness readings ranged from 2.7 to 
Background brightness ranged 
foot-lambert for pine forest at the 
ground setting to 37.0 foot-lamberts 
at the highest setting 

Ss were seated at a table 37 feet 
This simulated a 
the Convair 440 
Electra models 


calm sea at 


experiments, 
51.0 foot 
from 5 
lowest back 
| sky 


for cloudy 


lamberts 


from the models 
distance of 3.6 miles for 
and 3.2 miles for the Lockheed 
An indirect light between the appa 
ratus and the S provided an illumination of 
candle at the S’s writing surface 

Procedure 


viewing 


38 toot- 


Each was told the 
general objective of the particular study. One experi- 
menter was positioned behind the draperies to mount 
the models called for, while another was at 
(near the where he 

settings and background ‘presentation 
lem was completely 


Ss were tested singly 


the con- 
controlled light 

After a prob- 
set up, the subject was given 2 


sole subject), 


seconds to observe it. The timing of the exposure was 
governed by raising and lowering the window shade 
mounted in front of the models and background 
For the conspicuity studies 
which of two 


Ss were asked to judge 
models in a paired comparison was 
more conspicuous and to indicate his choice an 
Each S received 10 practice problems, 
20 problems under each of 5 combina- 
tions of lighting, background, and attitude for a total 
of 100 problems. (Twenty problems result from pre 
senting each pattern of the 5 once on the left side 


on 
answer sheet 


followed by 


and once on the right side with every other pattern.) 
The conspicuity score for a pattern consisted of the 
total number of times it was selected in the paired- 
comparison presentations 

the attitude were presented 
With the exception of those based on sector 


For 
singly 


studies, models 


coding, patterns on the models were not shown to Ss 


at close range, nor were they described to them 
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the 
deemed to be directly 


(Performance on sector-coded patterns 
dependent upon knowledg« 
of the pattern; for example, coding of the right wing 
with green fluorescent paint would be of little use 
unless Ss knew it was the right wing [and not the 
left] which carried the green. Therefore, Ss were 
taught this pattern and allowed to view each varia- 
tion at close range.) Each S had 10 practice runs 
using an all-aluminum painted aircraft model against 
a clear white 
ticular combinations of 
and lighting 


was 


background, and 5 others using par 
paint pattern, background, 
The S, after observing the 
aircraft during the 2-second exposure, indicated what 
attitude he thought he had seen by referring 
small display on which were mounted 15 unpainted 
models in attitudes corresponding to those presented 
the stimulus models. He given 15 viewings 
(each attitude appeared once) of each of 5 paint 
patterns, for a total of 75 problems. The for 
each pattern the number of attitudes 
identified out of the 15 presented 


condition 


to a 


by was 


score 


was correctly 


RESULTS AND DISCUSSION 
Conspicuity Experiments 


The major answer sought by these studies 
was whether paint patterns affect conspicuity 
of aircraft differentially. The paired-compari- 
son method used in these five studies permits 
comparison of patterns only within experi- 
ments. No attempt should be made to com- 
pare the conspicuity score of a pattern in one 
experiment with that of a pattern in any of 
the other four conspicuity experiments 

Officially Specified Patterns. Table 1 sum- 
marizes the results of the experiment using 
officially specified patterns. The results to the 
left of the diagonal are repetitions of those to 
the right, and are presented for convenience 
Comparison of the patterns in this and simi- 
lar tables following is most easily made by 
reading down a column (or 
patterns are ranked on conspicuity 
scores. The Federal Aviation Agency (FAA) 
and Navy patterns were superior to all others 
and were not significantly different from each 
other. The Air Force pattern was not sta- 


across a Tow), 


since 


tistically superior to the plain aluminum, 
which was ranked last. 

Since the FAA and Navy patterns have 
more red-orange fluorescent paint than the 


other patterns, the results suggest a positive 


correlation between the amount of fluorescent 
paint on the aircraft and its conspicuity un- 
der the experimental conditions. 
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TABLE 1 


Means (M), STanparRD Deviations (SD), AND ¢ TESTS FOR DIFFERENCES BETWEEN MEANS OF 
CONSPICUITY SCORES FOR OFFICIALLY SPECIFIED PATTERNS 


Pattern 


FAA 

Navy 5.24 1.15 
Coast 
Guard ee 
Air 
Force OE a 


Aluminum 74a." 


External Contrast. Two of the five patterns 
were designed to agree with the hypothesis 
that patterns offering the greatest brightness 
contrast with the generally lighter sky and 
darker ground would be most conspicuous.* 
Two other patterns were designed on the re- 
verse of this hypothesis. An aluminum model 
served as a control. As Table 2 shows, the 


4Like most suggested solutions to the aircraft 
conspicuity problem, this is an oversimplification. 
Aircraft can be viewed from above against lighter 
backgrounds (e.g., snow) and from below against 
darker backgrounds (e.g., storm clouds). 


Pattern 


Coast Air 


Guard Force Aluminum 


5.74** 


4.16** 


6.53** 


aa 


7 
5.11%" 5.84% 


4.16** 1.41 


oa. 


5.84** 


two patterns which agree with the hypothesis 
are ranked first and second on conspicuity 
scores and are both significantly different 
from the others. Further, neither of the pat- 
terns which did not fit the hypothesis was 
significantly different from a plain aluminum 
aircraft. This seems to support the hypothe- 
sized method of painting aircraft. 

However, closer examination of procedures 
revealed that the reason the hypothesis was 
supported may not have been as stated in the 
hypothesis. Ss were not always looking to- 
ward the bottom of the aircraft against a 


TABLE 2 


Means (M), STANDARD Deviations (SD), 


AND ¢ TESTS FOR DIFFERENCES BETWEEN MEANS OF 


CONSPICUITY SCORES FOR EXTERNAL CONTRAST PATTERNS 


Red? 
Pattern Black 
Red» 
Black 
White 
Black 5.12 
Black 
Red» 3.44 
Aluminum 3.16 
Black 


White 2.16 


Pattern ® 
White Black 


Black Aluminum 


2.58* 
3.18** 


4.65** 2.01 1.66 


* The color over the horizontal line represents the top of the aircraft; that below was the color of the underside. 


>“*Red”’ refers to red-orange fluorescent paint 
*p=.05; df =24 
** > =.01; df =24. 
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bright sky, nor at its top against a dark 
ground. Rather, the models were viewed in 
realistic attitudes. The same five flight atti- 
tudes were used against all backgrounds; it 
so happened that these displayed slightly 
more of the upper portion of the pattern than 
the underside. Each background had nearly 
uniform brightness, and, to some extent, hue. 
This attempt at realism prevented precise 
testing of the hypothesis by the experiment. 

A more likely explanation for the ranking 
of the patterns would be that the high illumi- 
nation from above resulted in a high re- 
flectance (hence greater conspicuity) even 
against bright backgrounds. Dark undersides 
emphasized the shadows under the aircraft at 
all illumination levels. Since airplanes in natu- 
ral settings would generally be viewed with 
considerable illumination from above, accept- 
ance of this explanation does not invalidate 
the conclusion that patterns emphasizing 
light tops and dark undersides are better 
than those with the reverse of this arrange- 
ment. 

Internal Contrast. These patterns were de- 
signed on the hypothesis that conspicuity is 
directly related to the brightness contrast of 
the paint pattern. Table 3 reveals that this 
hypothesis was not supported. Only one of 
the black and white patterns (greatest bright- 
ness contrast) was significantly different from 
plain aluminum, and the first-ranked white- 
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on-gray, while significantly different from 
plain aluminun,, is significantly different from 
only one of the black-and-white patterns. 

No ready explanation can be offered for 
the discrepancy between hypothesis and re- 
sults. Breaking up the masses of black paint 
and white paint may be detrimental to con- 
spicuity. Other studies have indicated that 
massing of paint is better for conspicuity 
(Siegel & Crain, 1961; Skeen, 1958), and 
there is some support for this in the experi- 
ment on sector coding. 

Point Coding. The models in this group, 
except the all-aluminum control, had the 
empennage painted with red-orange fluores- 
cent paint. On the one-point pattern, just the 
empennage was painted. The two-point pat- 
tern also had the forward fuselage painted; 
the three-point pattern had the two wing- 
tips painted; the four-point pattern had the 
wingtips and forward fuselage painted. Re- 
sults already obtained from the experiments 
on specified patterns and external contrast 
suggested that in this experiment, the pat- 
terns should be ranked according to the 
amount of fluorescent paint contained in each. 
Table 4 shows that 
port this hypothesis. Statistical tests substan- 
tiate the hypothesis. In the cases of the four- 
points, three-points, and two-points models, 
at least one of the immediately adjacent pat- 
terns is not significantly different from the 


results conclusively sup- 


TABLE 3 


Means (¥V), STANDARD DEVIATIONS 


CONSPICUITY SC 


White 
on 
Pattern gray 
White on 
gray 5.48 1.83 
White on 
black 
Black on 
white 


1.50 83 


1.49 74* 


2 
Aluminum 1.83 aa 


Black on 


gray 2.69 6.08** 


® The first-named color was the outline, the sex 
*p=.05; df= 24. 


** » = 01: df =24. 


(SD), AND t TEsTs FOR DIFFERENCES 
ORES FOR INTERNAI 


ond the body. 


BETWEEN MEANS OF 


CONTRAST PATTERNS 


Pattern 
White Black 


on on 


black white Aluminum 


6.08** 


© "20 
J.44 


4.29** 
3.44** 


4.29** 
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TABLE 4 


Means (M), StanpARD Deviations (SD), 


ConsPicuiIty SCORES FOR PoIntT-CoDED PATTERNS VARYING THE 


Pattern 4 points 
4 points 

3 points 1.85 
2.56* 
4.44** 
AY a 


2 points 

1 point 

No points 
(aluminum ) 


* p =.05; df 
** p> =.01; df =24 


given pattern, but further removed patterns 
are significantly different. For the aluminum 
and one-point models, even the immediately 
adjacent patterns are significantly different. 
Under the conditions of the experiment, then, 
increasing amounts of fluorescent paint cor- 
related with increased conspicuity. 

Sector Coding. This series of patterns was 
developed primarily to aid the pilot to de- 
termine flight attitudes of other aircraft. No 
predictions were made as to which patterns 


would be most conspicuous. The control pat- 
tern had the forward portion of its fuselage, 
its empennage, and both wingtips painted 
with red-orange fluorescent paint; for con- 
venience, this is referred to as the “RRRR” 


pattern (indicating left wing, right wing, 
and tail, respectively, were painted 
with red-orange fluorescent paint). The maxi- 
mally coded pattern differentiated left wing 


nose, 


AND ¢ TESTS FOR DIFFERENCES BETWEEN MEANS OF 


NuMBER OF ASPECTS PAINTED 


Pattern 


No points 


3 points 2 points 1 point (aluminum 


2.56* 


4.44** 
51 2.44* 
2.13* 


11.92** 
9.65** 
10.02** 
243° —_" 


10.02** 47** 


from right wing and front from back; the left 
wing was painted with fluorescent red-orange 
and the right wing with fluorescent green, 
while the front was painted black and the 
back was painted white. This pattern is 
designated “RGBW.” Of the three remain- 
ing patterns, one coded the front and rear of 
the fuselage while the wingtips were left 
uncoded (RRBW), another coded wingtips 
while the front and rear of the fuselage were 
undifferentiated (RGBB), and one differenti- 
ated front from rear and left from right, with 
one wing and the empennage not differenti- 
ated (RGBR). Thus, the codings supplied by 
paint patterns on different parts of the air- 
craft provided a variety of information to ob- 
servers through a variety of combinations of 
colors. 

Table 5, which summarizes the results of 
this experiment, provides further evidence 


rABLE 5 


MEANS (M/), STANDARD Deviations (SD), 


(CONSPICUITY 


Pattern 


RRRR 
RGBR 
RGBB 
RRBW 
RGBW 


Note Letters refer t 
fluorescent; B is black; W 
*D O05 4 


* > =.01; df =24 


AND ¢ TESTS FOR DIFFERENCES BETWEEN MEANS OF 
SCORES FOR SECTOR-CODED PATTERNS 


Pattern 


RGBR RGBB RRBW RGBW 


1.20 2 4.20** 
4.39** 
1.82 1.19 
4.39** 
9.24** 


7.05** 
9.24** 
3 70** 
3.63** 


3.63** 


range fluores: 
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that the amount of red-orange fluorescent 
paint correlates with conspicuity scores in 
these experiments. Once again, the pattern 
with the most red-orange fluorescent paint 
exceeded all other patterns on conspicuity. 
The pattern with the second greatest cover- 
age of red-orange fluorescent paint ranked 
second, although it was not significantly dif- 
ferent from the all red pattern. However, the 
pattern with the third largest area of cover- 
age of red-orange fluorescent paint was ranked 
fourth, while a pattern having less fluorescent 
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coverage was ranked third. Differences be- 
tween the mean conspicuity scores were not 
significant, which may indicate that the dif- 
ference in amount of red-orange fluorescent 
paint was not great enough to make a large 
difference in conspicuity. This result may also 
indicate that factors other than the area 
of red-orange fluorescent paint interact and 
change the conspicuity relationship noticed 
in the previous studies. 

Effect of other variables. Twenty-five analy- 
ses of variance were performed to determine 


TABLE 6 


Means (VM), 
PATTERNS ON 


RANKINGS 
ATTITUDE 


Pattern Second 


Officially Coast 
specified Guard 
V 12.00 


SD i1 
External 
contrast 


Top 


White 


AND STANDARD Deviations (SD) OF 


AIRCRAFT PAIN1 


DETERMINATION SCORES 


Nank 


Fourth Fifth 


Third 


FAA Air Force Aluminum 


10.60 10.40 9.09 


2.41 2.07 2.88 


Aluminum Black Black 


Botton 
VW 
SD 


Internal 


contrast 


Outline 


Suriace 
V 
SD 


Aluminum 
\luminum 
11.40 
2.41 


Point coding 


V 12.00 


SD 


Sector coding 
Left wing 
Right wing 
Nose 
lail 

VV 
SD 


® Red-orange fluorescent 


Green fluorescent pa 


Black 
11.40 
1.95 


Black 

White 
10.40 
1.95 


4 
(tail 
nose, 
wing 
tips 

11.60 


55 


Red* 
Green' 
Black 
White 

10.80 
1.30 


Aluminum 
9 S80 
3.19 


White 

Black 
10.00 
1.58 


) 


(tail, nose 


Red! 
Green 
Black 
Red* 
10.60 


2.07 


White 


&. SO 


Red* 
9 00 


1.58 2.59 


Black 
Gray 
9 20 


White 
Gray 
9 80 


2.28 45 


3 
(tail, 


wingtips 


10.40 


&9 


Red* 
Green 
Black 
Black 
10.40 


2.51 


None 


(aluminum 


10.00 
3.24 
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the effects on conspicuity scores of aircraft 
attitudes, backgrounds, lighting conditions, 
and subjects. Each individual pattern was 
treated separately because the method of 
paired comparisons leads to equal row, col- 
umn, and cell entries if patterns are consid- 
ered as replications of the basic graeco-latin 
square. Generally, there were so few signifi- 
cant findings in these analyses of variance, 
that conspicuity scores seem to have been un- 
affected by attitudes, backgrounds, lighting, 
or subjects. Only 10 F ratios of the 100 com- 
puted were significant; backgrounds were not 
statistically significant for any pattern; and 
none of the variables within the external con- 
trast patterns was significant. 

These results point up an important prob- 
lem of developing an effective paint pattern. 
While a particular pattern may be especially 
effective under a certain lighting condition or 
against a certain background, when evaluated 
across the wide variety of lighting and back- 
ground conditions found in flight, its advan- 
tage over other patterns becomes negligible. 


Attitude Experiments 


Table 6 summarizes the rankings of the 
paint patterns in each experiment. Since there 
was no relative motion, Ss had to depend 
heavily (if not entirely) on aircraft aspect 
to make their attitude determinations. Even 
when aspect was enhanced by distinctive cod- 
ing of right and left wings, nose, and tail (the 
second-ranking pattern of the sector-coded 
study), Ss found it easier (though not sig- 
nificantly so) to judge attitudes of a model 
on which sector codings were not given. The 
t tests for individual differences between 
means reveal only one which is significant. 
The difference between the model with red- 
orange fluorescent top, black underside, and 
the model with these colors reversed was sig- 
nificant at the .5 level. 

The findings that only two patterns yielded 
significantly different mean scores and that 
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there was no improvement in scores when air- 
craft aspect was enhanced would seem to in- 
dicate that the paint patterns tested make 
only a negligible contribution to pilot judg- 
ments of aircraft attitude. 

Of the other sources of variation examined 
in the analyses of variance, differences among 
backgrounds were significant only in the ex- 
periment on officially specified patterns, and 
differences among subjects were significant 
only in the experiment on sector coding. 

The significant F ratio for backgrounds in 
the experiment using officially specified pat- 
terns resulted from only one large difference 
—that between the mean attitude score with 
a background of blue sky and that with a 
background of pine forest. Attitude deter- 
mination was easier against the blue sky. 
Since differences among backgrounds were not 
significant in any other experiment, and since 
only one of the individual ¢ tests was signifi- 
cantly different, it seems more likely that 
backgrounds (at least the kind used in these 
experiments) have little influence on attitude 
determination. Thus, in this case where con- 
ditions were above threshold, and the ex- 
posure time sufficiently long that Ss had little 
or no difficulty locating the model and making 
their judgments of flight attitude, environ- 
mental effects were not great. 
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DIFFERENCES BETWEEN 


NORMAL AND UNDERACHIEVERS 


OF SUPERIOR ABILITY’ 


FREDERICK J. TODD, GLENN TERRELL, anp CURTISS E. FRANK 


University of Colorado 


An attempt was made to obtain descriptive information about bright normal 
achievers and underachievers with respect to 4 theory-related variables. 4 hy- 
potheses were stated predicting that, as compared with normal achievers, un- 
derachievers would manifest less need for academic achievement, would be less 
likely to have decided on specific vocational goals, would be more likely to 
perceive a relationship between coursework and attainment of goals, and would 
have lower expectancy for success in academic pursuits. The results obtained 
through the administration of 2 personality inventories and a specially devised 
questionnaire provided some support for all 4 hypotheses for male Ss. For fe- 
male Ss, however, support was found for only 2 of the 4 hypotheses. The re- 
sults were discussed with reference to sex differences. 


This study represents the beginning stage 
of a long range investigation of the under- 
achieving college student. The purpose of the 
study was to obtain information concerning 
certain nonintellective factors in underachieve- 
ment. 

In the past several years, much attention 
has been given to the problem of the under- 
achiever. Studies such as those of Weitz and 
Wilkinson (1957) have explored family con- 


ditions and educational experiences. Others 
have sought to discover the relationship of 
early training and experience to underachieve- 


ment (Baldwin, Kalhorn, & Breese, 1945; 
Winterbottom, 1953). By far the largest num- 
ber of studies, however, have attempted to 
differentiate achievers from underachievers on 
the basis of personality characteristics as 
measured by various tests and inventories 
(e.g., Burgess, 1956; Gebhart & Hoyt, 1958; 
Merrill & Murphy, 1959). 

By and large, the measuring instruments 
and the characteristics assessed have been se- 
lected on the basis of intuition or the avail- 
ability of the measuring instruments. As a 

1 This research was supported by grants from the 
Council of Research and Creative Work and the 
Committee on Coordination of Research, both of the 
University of Colorado. The authors wish to ac- 
knowledge their indebtedness to Vincent N. Camp- 
bell, who assisted greatly in the planning and exe- 
cution of this research. The help of David Muirhead, 
Director of Admissions at the University of Colo- 
rado, and of James Byrum, manager of the IBM 
Service, is also gratefully acknowledged. 


consequence, the results of these studies pro- 
vide important but isolated bits of informa- 
tion which are difficult to interpret within a 
single coherent systematic position. 

The present study has been carried out 
within a social learning theory orientation 
modified after Rotter (1954). This has been 
done in the hope that the information ob- 
tained in this and projected studies on the 
underachiever will be amenable to integra- 
tion from a single point of view. A conse- 
quence of this theoretical bias is that certain 
factors have been chosen for investigation 
rather than others. These are: affectional 
versus recognition needs, presence of long 
range goals, expectancy for certain academic 
pursuits to lead to attainment of goals, and 
expectancy for success in academic activities. 

The approach taken in this study was that 
of comparing underachievers with normal 
achievers by testing hypotheses calling for dif- 
ferences between the two groups with respect 
to the four variables given above. The authors 
wish to make clear that the specific hypothe- 
ses which follow were not derived in any for- 
mal sense from the theoretical position they 
espouse. Indeed, these, and a multitude of 
other hypotheses may be easily obtained from 
a perusal of the literature. Nevertheless, it is 
emphasized that the particular variables with 
which these hypotheses are concerned as well 
as the instruments used to obtain the infor- 
mation are a consequence of the authors’ so- 
cial learning theory orientation. 
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Four hypotheses were investigated. They 
are: 

Hypothesis I. Underachievers will reveal 
greater need for love and affection in social 
situations and less need for recognition status 
in academic situations than will normal 
achievers. 

Hypothesis II. Fewer underachievers will 
have decided on a specific vocational goal 
than normal achievers. 

Hypothesis III. A larger proportion of nor- 
mal achievers than underachievers will per- 
ceive their doing well in their course work as 
important to the attainment of their long 
range goals. 

Hypothesis IV. Underachievers have lower 
expectancy of doing well in academic activi- 
ties than do normal achievers. 

In order that this, the initial stage of the 
research, be kept more manageable, it was de- 
cided that only the bright under- and normal 
achievers would be studied. It is quite likely 
that intelligence interacts with factors that 
produce underachievement. Different levels of 
brightness will be studied in later stages of 
the research. 

PROCEDURES 


The subjects used in this study were sophomores, 
juniors, and seniors enrolled in the College of Arts 
and Sciences at the University of Colorado in the 
spring of 1960. The subjects were selected in the fol- 
lowing manner. 

All students with Academic Aptitude Test (AAT) 2 
scores at the eightieth centile or above were identi 
fied. Of these students, those whose grade point av 
erage was 3.00 or above (on 
which O=F and 4 A) 
achievers 
low 2.00 


a four-point scale on 


were classified as normal 
Those whose grade point average was be 
were classified as underachievers. In this 
which 
under 
achievers. Letters were mailed to each of these stu 
dents inviting them to participate in a study of the 
curricula and requirements of the College of Arts 
and Sciences. Two hundred forty-four, or 74% of 


the selected students agreed to participate. This final 


manner, 332 subjects were selected, 227 of 


were normal achievers and 105 of which were 


sample consisted of 177 normal achievers and 67 un 
derachievers. Sixty-six of the normal achievers and 
39 of the underachievers were male, while 111 of the 
normal achievers and 28 of the underachievers wer 
female. Thus, the total group of subjects was com 
prised of 105 males and 139 females 

2 The Academic Aptitude Test (AAT), 
verbal test, was developed as an entrance examina 
tion at the University of Colorado 
tion of +.50 


largely a 


It has a correla- 
with grade point average 
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The subjects were assembled in three groups and 
administered two inventories and a questionnaire, 
each of which was designed to elicit information di 
rectly relevant to one or more of the hypotheses to 
be tested in this study 


Instruments 


The three instruments administered were the Goal 
Preference Inventory (GPI), developed by Liverant 
(1958), the Inventory of Expectations (IE), devel 
oped by Jessor and Mandell (Mandell, 1959), and 
the Vocational Goal Questionnaire (VGQ), devised 
specifically for this study. 

The GPI, selected to test Hypothesis I, was 
constructed within Rotter’s Social Learning theory 
framework, to measure relative strengths of need for 
academic achievement and need for social love and 
affection. It contains 60 forced-choice items such as 
the following 

I most prefer to 


la. Try to get into plays, on radio, etc 

b. Volunteer for oral reports in the classroom 

2a. Take a where the instructor teaches 
on a very high level 

b. Be patient with and tolerant of a friend’s 
more objectionable characteristics 


course 


The IE, selected to test Hypothesis IV, was also 
developed within the framework of Rotter’s theory 
Its purpose is to measure expectancy for success in 
the areas of academic achievement and attainment of 
social love and affection. Like the GPI, its items are 
of the forced-choice type, samples of which are 

I most expect 


la. To do favors for others 

b. To be able to contribute something worth 
while in class discussion 

2a. To sit with friends in class 

b. To help get dates for my friends 


The VGQ was devised to obtain information per 
taining to Hypotheses II, III, and IV. It consists of 
eight questions, six of which are accompanied by a 
four-point rating scale 
naire will be 


The items of this question 
described in the results section 


Statistical Analysis 


The data yielded by 
a nature as to make ft 
while chi 


the GPI and IE are of such 
tests of significance appro 
tests of significance wert 
used in analyzing the frequency data obtained by the 
VGQ 
teracts 


priate, square 
there is some evidence that sex in 
with factors affecting academic 
(Shaw & Grubb, 1958; 


Secause 
achievement 
Summerskill & Darling, 1955), 
comparisons specific to each hypothesis were made 
between the normal and underachieving groups for 
each sex separately. As a matter of interest compari 
sons were made with the sex groups combined. These 
results are not reported in relation to each hypothesis 
but are noted in the final paragraph of the following 
section. 
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TABLE 1 


DIFFERENCE IN GPI 


AND IE ACADEMIC RECOGNITION SCORES 


BETWEEN NORMAL AND UNDERACHIEVERS 


GPI 
Sex SD 


Male Ss 
Normal 
Under 
Females 
Normal 
Under 
*p < .02 
**# > < O01 
#** D < 001 
RESULTS 
Hypothesis I 


The GPI provides a score for need for aca- 
demic recognition (AR) and a score for need 
for social love and affection. The first of these 
is determined by counting the number of re- 
sponses in which the academic need alterna- 
tive was chosen. The second score is deter- 
mined by subtracting the academic need score 
from the total number of items. Because these 
scores are complements of each other, a test 
of significance is required for only one of 
them. 

In Table 1 are presented the results of the 
analysis of the GPI. As predicted, the need 
for AR mean score for male achievers (25.9) 
is significantly higher (p < .001) than that 
for male underachievers (21.59). In females, 
however, the difference in mean AR scores be- 
tween normal and underachievers (23.59 and 
23.18, respectively) does not reach signifi- 
cance. 


Hypothesis Il 


The VGQ contains two items which are 
relevant to this hypothesis: 


1. What would you most like to be if you could 
be anything you wanted to be? (If you know what 
you would like to be under these ideal circumstances, 
write your choice in the space below. If vou do not 
know what you would like to be under these ideal 
circumstances, write “Don’t know” in the space be- 
low). 

Because what we become is frequently 
limited by reality factors (lack of oppor- 
tunity, money, a specific talent, etc.), what 
we actually intend to be may (or may not) 
differ from what we would like to be if there 
were no restrictions imposed by reality. Thus, 
the second question is: 


2. What do you actually intend to try 
(Circle the appropriate letter.) 
a. Same choice as in Question 1. 
b. Different from Question 1. (Please 
occupational intention.) 
c. Have not yet decided 


TABLE 2 
REQUENCIES OF NORMAL AND UNDERACHIEVERS 


1) ITeEmMs RELATED TO HyportuHesis II 


1 


mal achievers 


VGQ Item 


:. 


Undecided 


Underachievers 


Decided Undecided 
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The first item is designed to identify those 
subjects who have, at least, contemplated an 
ideal vocational goal for themselves, while the 
second identifies those who have actually es- 
tablished a vocational goal. 

In Table 2, it will be observed that 
nificantly more female normal achievers (p 

.01) have identified ideal vocational goals 
than female underachievers, whereas no sig- 
nificant difference exists between male normal 
and underachievers. 


sig- 


In order to discover if there were differ- 
ences between normal and underachievers as 

whether or not they had actually selected 
a goal, the responses in Categories a and b 
of Item 2 were combined and tested against 
ponses of Category c for male and for 
Table 2 reveals that for the males, 
significantly more normal achievers have de- 
cided on vocational goals than have under- 
achievers (p = < .05). For the female sub- 
jects, the difference between normal and un- 
derachievers does not reach significance (p 

10). 


the ne 
females. 


Hypothesis III 


The VGQ contains three items relevant to 
the hypothesis. They are: 


your present 


Fairly 


portant 


Quite 
unimportant in important 
uch do you think your present course of 


contribute toward your future success in 


D 


Very little 


Very 
The first item pertains directly to voca- 
tional goals and could be asked only of those 


subjects who had stated a vocational choice. 


I'wo-hundred and eight subjects had done so 
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in response to the two items pertaining to 
Hypothesis I. 

In applying the x? test to the data obtained 
from this item, the responses to Scale Steps 
A, B, and C were combined and tested against 
those obtained for Scale Step D. This was 
done because when the data for males and 
females were analyzed separately, relatively 
few A and B alternatives were chosen by the 
subjects, resulting in theoretical frequencies 
of less than 5, even for the combined A and 
Also, the dichotomy between 
“Quite important” and all other scale steps 
seemed as defensible as any of the other al- 
ternative dichotomies in a test of the hy- 
pothesis.* 


B categories. 


The results of the analysis may be found in 
Table 3. It will be noted that, as predicted, 
for the male subjects a significantly greater 
proportion of normal achievers checked the 
“Quite important” alternative than did un- 
derachievers (p .02). For the female sub- 
jects, however, the difference between normal 
and underachievers, though in the expected 
direction, was not significant (p = .20). 

The and third items listed above 
concern course work in relation to very ab- 
stract, long-range goals. Both of these items 
were responded to by all subjects 


second 


Analysis of the results of these items fol- 
lowed the same pattern as for the preceding 
That is, 
obtained for the 


item. the frequencies of responses 


most positive scale step 
(Step A in these two items) were compared 
with the frequencies obtained for the other 
three scale steps combined. These results are 
presented in Table 3. 

For the these two items (relation 
of coursework to future success in life) 
normal 


first of 
male 
significantly 
greater proportion of responses to the most 
step than did under- 

05). significant 


achievers revealed a 


positive scale male 
achievers (p Again no 
difference was found between female normal 
and On the second of these 
last two items (relation of course work to fu 
ture happiness) 


underachievers. 


normal achievers did not 

>The x° resulting from the trichotomy, AB, C, 
and D yielded the same significant and nonsignifi- 
cant l levels 


differences although 
varied slightly. This was also the case with all other 


actual significance 


imilar comparisons 





DIFFERENCES 


BETWEEN NORMAL AND 


UNDERACHIEVERS 


rABLE 3 


FREQUENCIES OF NORMAL 


AND UNDERACHIEVE! 


VGQ Items RELATED TO HyportHi IT} 


Normal achievers 


Remaining 


VGQ Iter 


differ significantly from underachievers in 
either the 

In summary, Hypothesis III receives sup- 
port from the results obtained from the male 
group on the first two items, but not on the 
third. No support was obtained for the hy- 
pothesis from the female group. 


male or female groups. 


Hypothesis IV 

The IE of Mandell provided information 
incy for AR. The VGQ con- 
relevant to expectancy for suc- 
and 
items pertaining to expectancy for success in 
course work. The three VGQ items are: 


relevant to expect 
tains one item 


cess in attaining vocational goals two 


} 


1. All things considered, how likely do you think i 


that you will actually be 


noticed that the 


It will be two items per- 
taining to expected 
that the 


for success under 


J 
WOrkK 


success in course 


differ in first concerns expectancy 


conditions, while 


present 
the second pertains more directly to the sub- 


ject’s evaluation of his ability to achieve suc- 
cess if he did his best. 

The results obtained from the IE 
Table 1. The mean AR 
normal achievers are significantly higher than 
those for the underachievers in both the 
and female groups (p < .01 and p < .02, re- 


spectively ). Accordingly, normal achievers are 


ire pre- 
sented in scores of 


male 


judged to have greater expectancy for aca 
demic success than are underachievers 
dicted by Hypothesis IV. 


In keeping with the procedures employed 


as pre- 


in analyzing the data obtained in connection 
with Hypothesis III, the responses obtained 
for the most positive scale steps of the VGQ 
items (Steps A, A, and D, respectively, of the 
three items under consideration) were com 
pared with those obtained for the other three 
steps. Responses to the first items were ob- 
tained only from those subjects who stated a 


vocational choice in response to the 


to Hypothesis I. All 
sponded to the other two items. 


relevant subjects 


The results are given in Table 4. 
The responses to the first VGQ item (ex- 


pectancy for attaining vocational goal) did 


not differ significantly between normal and 


underachievers for either the male or female 


] 
LOals, 


vocational 
Hypothesis IV was not supported. 


groups. Thus, as concerns 


On the second item (expectancy for success 
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TABLE 4 
RESPONSE FREQUENCIES OF NORMAL AND UNDERACHIEVERS TO 
VGQ Items RELATED TO HyportuHEsis IV 


Normal achievers 


Most positive 
scale step 


VGQ Item 


‘. 


3 
4 


2 


7. < OS 
*# > < O01 


in course work with present effort) a signifi- 
cantly greater proportion of normal achievers 
than underachievers selected the most posi- 
tive response in both the male and female 
groups (p < .001). 

Responses to the third item (expectancy 
for success in coursework with maximum ef- 
fort) differentiate female normal 
from female underachievers in accordance 
with the hypothesis (p < .05). In male sub- 
jects, however, the difference does not reach 
significance. This is the second of the only 
two instances in which a significant difference 
was found for the female group but not for 
the male group. 


achievers 


The results obtained pertaining to Hy- 
pothesis IV may be summarized as follows: 
As regards expectancy for success in academic 
achievement, the results of the IE and the 
second VGQ item support, in both the male 
and female groups, the prediction that nor- 
mal will have greater expectancy 
for academic success than will underachievers. 
The third VGQ item supports this prediction 
only with respect to female subjects. As re- 
gards expectancy for success in achieving an 
intended vocational goal, the results of the 
first VGQ item provide no support. 

As regards the results of this study as a 
whole, it is of interest to note that all differ- 
ences found (whether significant or not) were 
in the predicted direction, and tests of sig- 
nificance computed for the data of the com- 
bined male and female groups yield signifi- 


achievers 


Remaining 
scale steps 


Underachievers 


Most positive Remaining 


scale step scale steps 


11 02 

12 19 
16.36** 
6.98** 
3.56 


4.20* 


cant results supporting the respective hy- 
potheses in every case except for the first 
item relevant to Hypothesis IV. 


DISCUSSION 


The results of this study provide, in vary- 
ing degrees, support for all four hypotheses 
and thus, in some measure, differentiate bright 
normal achievers from bright underachievers 
with respect to the four factors of 
needs, expectancy for success, and expectancy 
that certain activities will lead to certain 
desired goals. However, as in other studies 
(Shaw & Grubb, 1958; Summerskill & Dar- 
ling, 1955), it was found that the results 
differ depending on the sex of the subjects, 
significant differences being found in six of 
the 10 comparisons for male subjects and in 
but three of the 10 for female subjects. On 
only one item was a significant difference 
found in both sex groups. 

Looking at the results in another way, some 
support is found for all four hypotheses in 
the results of the male group, while in the 
female group, support is found only for Hy- 
pothesis II (concerning vocational 
and Hypothesis IV (concerning expectancy 
for academic success). Clearly any attempt 
to understand the underachiever must sooner 
or later account for these sex differences. 
Ultimately further research directed toward 
sex differences must be undertaken. Never- 
theless, tentative explanatory suggestions may 
be proffered on the basis of results found by 


goals 


choice ) 
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others. In this connection, an examination of 
some of the interaction of sex with the fac- 
tors investigated in this study may be profit- 
able. It is recognized, however, that attempted 
explanations of these interactions can, at this 
point, be little more than speculative hy- 
potheses. 

Hypothesis I recognizes needs as a basic 
motivating factor in behavior. Although be- 
havior itself is a function of other variables 
is well (e.g., expectancies and goals), it 
seems reasonable to expect that normal and 
underachievers will differ as regards the needs 
for which they seek satisfaction. The need 
for achievement has been postulated by Mur- 
ray (1938), Edwards (1954), McClelland 
(1953), and Rotter (1954) as an important 
motivating factor in behavior. Moreover, the 
studies of Gebhart and Hoyt (1958) and 
Merrill and Murphy (1959) report differences 
in this need as measured by the Edwards Per- 
sonal Preference Schedule, for groups differ- 
ing in achievement indices based on academic 
performance. In these studies, there is im- 
plicit identification of achievement with aca- 
demic achievement, and no attempt was made 
to eliminate the influence of expectancy for 
success. The GPI, employed in the present 
study, is designed to measure specifically the 
strength of need achievement 
relative to the need for social love and affec- 
tion, and attempts through instruction, to 
hold expectancy for success constant. 

Within the restrictions imposed by the sam- 
ple of subjects used in this study, an incom- 
patibility is suggested for the male group be- 
tween behavior leading to satisfaction of the 
need for social love and affection and satis- 
factory academic performance. No such in- 
compatibility may be inferred from the re- 
sults obtained from the female subjects. This 
suggests the possibility that female college 
students can take their needs “in 
stride,” without their interfering to serious 
degrees, at least, with their academic achieve- 
ment. Male college students apparently are 
unable to do this. Perhaps this is a reflection 
of the hypothesis suggested by McClelland 
(1953) that a higher level of scholastic per- 
formance is expected of girls than boys at all 


for academic 


‘ 


social 


levels of development. The latter has some 


support in the general finding that under- 
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achievement is more frequently found in 
males than females. Also Shaw and McCuen 
(1960) found that underachievement is pres- 
ent in males when they enter school, while it 
apparently does not begin with females until 
the junior high school years. It seems reason- 
able to assume that, if McClelland’s sugges- 
tion has merit, girls would be better prepared 
than boys to meet their social needs without 
an accompanying detrimental effect on their 
academic work. 

Hypothesis ITI is based on the assumed im- 
portance of goals in organizing behavior in 
sequential patterns and permitting delay of 
need gratification in the service of subsequent 
greater or more lasting satisfaction. Voca- 
tional goals are but one class of long-range 
goals, and because cultural expectations as 
regards vocational choice differ for the sexes, 
it does not seem surprising that the relation- 
ship of vocational choice to academic achieve- 
ment differs for men and women. In this 


study, in the female group only did a greater 
proportion of normal than of underachievers 
indicate an 
for male subjects only did a greater propor- 


ideal vocational choice, whereas 


tion of normal than of underachievers indi- 
cate an actual vocational choice. It would ap- 
pear that having a vocational goal does not 
favorably affect the male student’s behavior 
as regards academic achievement unless the 
goal is one which he actually intends to at- 
tain. For the female subjects, however, merely 
having an ideal vocational goal, which 
may never expect to attain, seems somehow 
to affect her achievement favorably, possibly 
because it selecting 
courses interesting to her and gives meaning 
and continuity to her various courses. For the 
female student, an expected vocation may be 
“housewife and mother”; the idealized voca- 
tional goal merely keeps the intervening aca- 
demic experiences organized until marriage. 
The importance of seeing the relationship 
of certain courses of action to attainment of 
certain goals is reflected in Hypothesis III. 
The fact that support for the hypothesis was 
found in only the male subjects may be due 
in large measure to the nature of the particu- 
lar items used in testing the hypothesis. 
The first item concerned the relationship of 
coursework to attainment of a vocational goal. 


she 


provides a basis for 
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As suggested earlier, it is quite likely that 
many women students expect their ultimate 
occupation to be that of housewife and 
mother. And it appears reasonable to assume 
that it is somewhat more difficult to perceive 
a direct connection between the specific con- 
tent of college courses and this occupation 
than between and careers 
in a business or profession. However, if this 
is so, there is apparently scant recognition of 
the importance of an intellectual-cultural ori- 
entation to the mother who hopes that these 
values will be inculcated in her children. The 
same sort of considerations as were applied 


academic courses 


to the first item may be applied to the second. 
The second item used in testing Hypothesis 
III concerned the relationship of coursework 
to future success in life. It is not unlikely that 
success is thought of by males in terms of 
professional goals, while for females, success 
is to a greater extent related to marriage and 
family. 

The third 


considered 


item related to Hypothesis II] 
coursework as future 
Because this goal is quite abstract 


related to 
happiness 


and general, and because happiness is con- 


tingent on so many factors, it is perhaps not 
surprising that significant differences between 
and underachievers found for 


normal were 


neither sex group. 


The assumed importance of expectancy for 


success in influencing behavior gave rise to 
Hypothesis IV, and, as the other hy- 


potheses, no differences were predicted on the 


with 


basis of sex. A discrepancy in the results was 
found, however, with respect to the last item 
devised to test this hypothesis. This item has 
to do with expectancy for success in course- 
work if maximum effort were made to do well. 
For the female group only is there a signifi- 


en 
cant difference between normal and under- 


achievers. This finding is difficult to « 

It is almost as if there is a tendency in 
female underachievers to regard her lack of 
achievement as a reflection of her ability, at 
least in the areas covered by her courses, 
whereas in the male underachiever, the tend- 


ency is to perceive his lack of achievement as 
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merely a lack of sufficient effort on his part. 
From this rather tenuous supposition, one 
might speculate that ability or potential in 
academic pursuits is not as vital to the self- 
concept and self-esteem for a female as it 
is for a male. Perhaps endowments in other 
areas are more central to the feminine role. 
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RATER ACCURACY AS A GENERALIZED ABILITY’ 


CECIL J 


Air Force Systems (¢ 


} 


To see if those people who can most accurately estimate their peers’ 


MULLINS ann RONALD C 


FORCE 


»mmand, Lackland Air Force Base, Texa 


periorm- 


ance on an objective criterion are also those who can most accurately rate their 
peers on carefulness, 236 basic airmen estimated the scores their peers made on 
a vocabulary test. Then they rated their peers on carefulness, and all Ss took 
5 carefulness tests. All 5 of the carefulness tests correlated higher with the rat- 
ings assigned by airmen who most accurately estimated their peers’ vocabulary 


scores than they 
estimated thei 
demonstration of the gi 


peers’ 


Intuitively, most of us feel that there must 
the ability of 
raters to rate accurately. In any group of 
there must be individuals can 
consistently rate other people more objec- 
tively, more precisely, and more accurately 
than other raters are able to do. When we 
use rating scores as a criterion against which 
to validate predictor instruments, we usually 
pool all the rating scores we be they 
accurate or be they not so accurate. and use 
some sort of average of these pooled ratings 


as our criterion score. 


be individual differences in 


raters, who 


can get 


vary consistently in 
and if 
way of picking out 


If raters do, indeed 


their rating accuracy, we can devise 


some the good ones, we 
would be in considerably better position for 
our validation studies. 

This study is an attempt to identify the 
and to see 


more accurate raters in a group 


whether or not those raters who are more ac- 


laracteristic are aiso 


her unrelated 


urate at rating on 


more accurate at rating anot 
characteristic. 
RATIONAL] 


In order to determine which raters are 
accurate than other 


some anchor for judging 


more 
raters, one has to have 
accuracy. In other 
words, the rating on some characteristic must 
be accompanied by some other measure of 
that characteristic with unquestionable va- 
liditv. In this study, that anchor was pro- 
vided by the scores made by the subjects on 


1 The research reported in this paper was sponsored 
by Personnel Laboratory, Aeronautical Systems Di- 
vision, under AFSC Project 7717, Task 87002 


did with ratings assigned by the airmen who least accurately 
vocabulary scores. These results were interpreted as a 
neralizability of rating accuracy. 


a vocabulary test. After the vocabulary test 
was taken, each man was required to estimate 
the scores of each of his peers on that same 
vocabulary test. 

Once an identification has been made as to 
which raters in the group are the most ac- 
curate ones on estimating vocabulary, it then 
remains to see if they are also the most ac- 
curate raters on some other characteristic (in 
this case, carefulness). 

If one can assume that a number of 
designed to measure carefulness do indeed 
measure carefulness to some extent, and if 
one can assume that the people in a group 
of subjects vary in their ability to rate care- 
fulness, and if one can further assume that 
those who can estimate a man’s vocabulary 
are also the ones who can estimate his care- 
fulness, then one can expect the correlations 
between the test and the 


carefulness ratings assigned by the more ac- 


tests 


carefulness scores 
curate raters (defined by their accuracy in 
making vocabulary estimates) to be generally 
higher than the correlations between the care- 
fulness test scores and the carefulness ratings 


issigned by the less accurate group 


MrtTHOD 


The subjects used in this study were basic airmen 
in six consecutive flights reporting for experimental 
testing at the personnel laboratory. Each flight lives 
in a separate two-story barracks. During the experi- 
mental testing session, the airmen were required to 
indicate on their record cards whether they lived on 
the first or second floor of their barracks, so that 
proper rating groups could be set up later. During 
I vocabulary test was administered to 
all the airmen, along with th 


this session, a 
following five experi- 
mental carefulness tests: 
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Score Checking (PL 8016). The subject is pre- 
sented a sheet showing 15 two-digit scores for each 
of 25 persons. On the back side of the same page is, 
supposedly, a reproduction of the first page, which 
the subject must check for transcription errors. There 
are 64 score discrepancies. 64 items, power. 

Counting Accuracy (PL 8015). The subject is pre- 
sented with a series of rectangles 1” X 14”. In each 
rectangle are 75-89 small dots. The subject must 
count the dots. 36 items, power. 

Clerical Carefulness (BE 464MX). The subject is 
presented a page of 50 rows and 15 columns of three- 
digit numbers. He is required to find and respond to 
the highest number in each row. 49 items (plus 1 
practice), power. 

Letter Comparison (PL 0011). This test requires 
the subject to compare a group of randomly typed 
letters on the left side of the page with a similar 
group of letters on the right side of the page, and to 
mark whether the two groups are identical. Each pair 
of letter groups constitutes one item, and there is 
never more than one letter difference between the 
20 items, power. 

Letter Counting (PL 0092). Each item of this test 
consists of a line of randomly typed letters, in which 
the letter “e” appears 1-15 times. The subject is re- 
quired to count the e’s in each line and mark his an- 
swer sheet accordingly. 50 items, power. 

Odds-evens reliability coefficients for these tests, 
corrected for length by the Spearman-Brown proph- 
ecy formula, are given in Table 1. 


groups. 


Approximately 2 weeks after the testing, peer rat- 
ings on carefulness were collected from the same air- 
men who took the experimental testing. The ratings 
were collected on a five-point scale, with one re- 
striction placed upon the raters. They were required 
to place no more than six of their buddies in any 
one of the five categories. This restriction was im- 
posed to force some separation in the ratings. 

At the same time that the carefulness ratings were 
collected, the airmen were also required to estimate 
the actual scores that each of their buddies made on 
the Vocabulary-C test, a copy of which was given to 
each man to refresh his memory. They were told 
that the lowest score made in that flight was 9 and 
that the highest made was 36. This was a 
purely arbitrary range, and was given to the airmen 
only to furnish a frame of reference to make their 
estimation task somewhat easier 


score 


The Category IV airmen (those in the lowest apti- 
tude were eliminated from the study. A 
few airmen missed either the testing session or the 
rating session, so that the final N for the entire group 
was 236. There were 12 rating groups, varying in 
size from 16 to 26. 


category ) 


When all the data had been collected and scored, 
each man’s estimate of the performance of each of 
his buddies on the vocabulary test was compared 
with the actual performance of each of his buddies, 
and a difference score was assigned to each estimated- 
actual combination. These difference scores were then 
averaged to provide an average “miss” score for each 
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FORCE 


man when he estimated the vocabulary of all his 
peers. 

On the basis of this average difference score as- 
signed to each rater, the one-quarter of each rating 
group who made the most accurate average vo- 
cabulary estimates was obtained, and the one-quar- 
ter of each group making the least accurate average 
estimates was obtained. 

The average carefulness rating of a given subject 
provided by the subjects designated as high accurate 
raters (on the basis of vocabulary estimation) was 
obtained for each of the 236 subjects in the sample. 
A similar computation was carried out using ratings 
provided by the group of low accurate raters. 


RESULTS 


Intercorrelations were computed between 
all the variables (five carefulness tests, the 
vocabulary test, and the two sets of care- 
fulness ratings). These intercorrelations are 
given in Table 1. 

A separate composite carefulness test score 
was devised, which consisted of the number 
of times a subject appeared in the top half of 
the distributions of the five separate careful- 
ness test scores. This composite carefulness 
score was then correlated with each set of 
the carefulness ratings. These validities, as 
well as the separate validities for each of the 
carefulness tests, are given in Table 2, for 
both the more accurate and the less accurate 
groups. 

The correlational technique employed was 
one described by Schutz (1960). The 7’s given 
are product-moment r’s estimated from a phi 
situation by the following formula: 


r = 6.28 
1 


— 15/ 


This correlational technique was used because 
it is very quick and easy to compute once the 
data have been recorded. 

It is apparent from Table 2 that the validi- 
ties for the more accurate rating group are 
higher than those for the less accurate group 
as predicted. The differences are not large, 
but they are extremely consistent (they occur 
for every one of the five carefulness tests and 
for the composite). No statistical tests of sig- 
nificance were run on these differences for a 
number of reasons. We have 


informal evi- 


dence that these coefficients differ from Pear- 





RATER AcCURACY AS A GENERALIZED ABILITY 


TABLE 1 
INTERCORRELATIONS OF CAREFULNESS TEST SCORES, VOCABULARY SCO 
AND CAREFULNESS RATING SCORES 


(N = 236 


Average of Ratings by 

Less Accurate § of Raters 
Average of Ratings by 

More Accurate } of Raters 
Letter Counting 
Letter Comparison 

il Carefulness 

Counting Accuracy 


Score Chec} 


apparent. Indeed, the consistency of 


son r’s particularly when relationships are this _ readily 
the very low 


3esides, in an exploratory study of this the size difference in view of 
overall validities is really surprising and en- 
couraging. This is especially true when we 
realize that the subjects used as raters were 


slig 
type, it is the trend, or the consistency, which 


4 


is important, and this trend seems to be 


TABLE 2 very young and naive. 
COMPARISON OF CAREFULNESS TEST VALIDITIES FOR We feel, then, that some evidence has been 
THE Mort pig ~~ Poy frown Less ACCURATE presented for the generality of rating ability. 
a a Further studies are currently under way t 
2 see if this finding can be duplicated with othe: 
groups and with ratings of other qualities 
and to find short paper-and-pencil predictors 
of rating ability if we are able to replicate 


Less More 
Accurate Accurate 


Score Checking 13 these results. 
Counting Accuracy 

Clerical Carefulness REFERENCI 
Letter Comparison . ee ae , —— . se 
Fontan Caaeaiine ‘ : ScHt TZ, W. < The Little Jiffy Correlator A simple 
; ‘ ‘ ; technique for a complex analysis of large numbers 
Composite Score = of measures on the same individuals. Educ. psy- 
Vocabulary-( chol. Measmt., 1960, 20, 111-118 
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THE USE OF 


RESPONSE PATTERNS TO IMPROVE 


ITEM SCORING’ 


DAVID CAMPBELL 


This study compared the validity of 2 


Vinnesota 


methods of scoring the Minnesota Vo 


cational Interest Inventory. One method used keys developed in the usual man 
ner by selecting items that differentiated between specific occupational groups 


and a reference group of tradesmen-in-general. The 


} 


other method used keys 


developed by selecting items that were part of response patterns that differenti- 


ated between criterion and refer¢ 
and cross-validated: painter, printer, 


ice <roups. 3 occupational keys were developed 
and electrician. Results showed that th 


keys were about equal in their ability to separate criterion from reference 


but the key developed 


groups, from 

Since the publication of the increasingly fa- 
ious ““Meehl paradox” (Meehl, 1950) show- 
ing that two items which had zero validity 
when used singularly could have perfect va- 
lidity when used together, there has been con- 
siderable interest in the problem of configural 
scoring. Horst (1954) (Lubin & 
Osburn, 1957; Lykken, 1956) have laid the 
mathematical foundations, and other discus- 
sions have reemphasized Meehl’s viewpoint, 
i.e., that configural scoring should be an im- 


and others 


provement over conventional methods (Gaier 
& Lee, 1953; McQuitty, 1957). 
Unfortunately, the empirical results seem 
somewhat less promising. While configura] 
scoring seems to do better than item scoring 
when applied to the original validation group, 
it doesn’t hold up as well under cross-valida- 
tion (Forehand & McQuitty, 1959; Lee, 195¢ 
McQuitty, 


counted for in at 


1957). These- results could be ac- 
least more 
a combination of the two): first, by 


1 c e 


the relative instability of any pattern of re- 


two ways (or 


second, because of the large num- 
1e influence of 


sponse: or 
} 


sys 
of possible 


patterns, by 


{ 

If some method 
ld be found which could mitigate these 
7 shrinkage, 


the increased validity of the 


on pattern selection. 


sources of cross-validation then 


perhap con 
figut ipproach could be retained. 
The current study was an attempt to us¢ 
configurations of responses to establish item 


oring keys. It was not an attempt at con- 


Computer time for this project was ma 
» Numberical Analysis Center, 


response 


patterns used far fewer items 


as only individual items 


figural scoring 
used configurations t 

lect the individual items for It was 
hoped that this approach would retain the ad- 
vantages of item scoring (relative stability of 


scored. Instead, it 


the keys. 


response, clerical ease of scoring, fewer items 
selected by chance) and at the same time uti- 
lize the increased validity of the 
approach. 


configural 


METHOD 


The basic method 


used was to ¢ 
ing keys, one ke t 


developed in 

using the individual items which bes 
between a criterion group and a refer 
the other key developed by using items 


part of larger patterns of response 
nated between the two groups. In the 
frequencies of response to each item w 
for both groups; and when the differ 


In t! 


ciently large, that item was scored 


patterns were 

The measurin ised was 
Vocational Interest ntory (MVII 
K. E. Clark for use at the skilled trad 
(Clark, 1961). The data 


responses of 917 skilled tradesmen, di 


cupations 


into electricians, painte 


vere part of Clark’s 
Clark’s original reference group of tr men-in-gen- 

il (TIG) was also included. This latter group was 

stratified s: e of the skilled tradesmen used in 

development of the inventory 

In the MVII, triads of stimuli are presented to tl 
individuals, and they are asked to select the one liked 
best and the one liked least. The stimuli are keyed 
by comparing the percentage of respons i 
as electriciar 


When the differen 


criterion group, such 


nce group of TIG 
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( hoice is 
1 


re 
and printers, 
cond ¢ 


lists 
l on 


and Clar 
I 


tne 


criterion 


and these 


iterion group 


th } 
metnod 


sttemnt ft ly 
1 npt tol 


sponses. However, the sc 


tern of response within a triac 


Number 


r 


mp 


} 
| 


e, in Triad 


Dislike B 


pattern will 


patterns 


> 
3. 


“Like 
Dislik 


ord 


Like 


rroups 


this differenc 


scored 


)OK 


a 


RESPONSE 


that 


on 


an example 


Gaevei 


electricians, painters, 
s-validated on 


cro 


hoice separately ; rhts 


at combin: 


weig 


scoring weigh 


1) 
ILKE 


erns we 
sma ] 
individua 


For 


ip is 
1:1 


A-Dislik 


ht r 
it not 


t 


(electrician) In an 


itions of 
for any 

i can be calcluated. For 
in Table 1 the pattern 


between 


PATTERNS AND ITEM SCORING 


rABLE 1 


[TRADESMEN-IN-—GENERAL RESPON 


tame 
attempt t 


individual stimuli involved wet | 
keys. In the above example Like A 


] 4 7 + } : 
be scored. Obviously, this resulted in 


was he 


valid variance, but it pec 
in error variance would more than off 

To how the “refined” scori 
rived this method 
derived 


uli only, Table 


show 
from would differ 


res] irequer 


ONSE 
re- 
pat- 


gives the d 
on the electricians scale. The 
tern for both electri 
scoring weights based 
be- he scoring weights resulting f 
patter Note that | 

weights for Triad 

} 


Ienlat 
Caiculatead 


is scored 
mased on a 
al stimuli 


in 


islike 
} 


A-] 
ought ol 
ating 
For 


? 
Campbell 


+} t 


indi 


I ‘lark 
TI oup 
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stimuli, 


tance 


mail! 


TIG picking one « 


hy pothe 
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TABLE 3 
\ COMPARISON OF SCORING WEIGHTS 
ON ONE TRIAD 


Percent of responses 
or 


weights 


0 
0 
0 


A second experimental key was also tried. On this 
key, no stimulus was scored unless both patterns con- 
taining it had at least 5% difference in frequency 
between the criterion group and TIG group, and at 
least one of the patterns had 12% difference or more. 
In Triad Number 3, the two patterns containing the 
Like C response were CAB and CBA. Both of these 
have at least 5% difference between the groups; and 
the pattern CAB has an 18% difference, so this re- 
sponse would be scored on the second experimental 
key also. This further restriction eliminated even 
more items from the scale, so it is shortest of the 
three 
statistic used was Tilton’s (1937) 
measure of overlap. The greater the separation of 


The validity 


groups, the lower the percentage of overlap. Perfect 
separation between a criterion group and reference 
group would be indicated by 0% overlap, while 
100% overlap would indicate no separation at all. 
An overlap of 35% would indicate that 35% of the 
scores of the criterion group could be matched by 
scores in the reference group. 


RESULTS AND DISCUSSION 


The results of using these keys on both 
original and cross-validation groups are shown 
in Table 4. 

The keys can be compared by looking at the 
percentage of overlap achieved within any one 
group. The experimental keys show up best 
within the cross-validation electrician group. 
There the current key achieves 35% overlap, 
the first experimental key 31%, and the sec- 
ond experimental key 34%. In the rest of the 
groups, the experimental keys are either equal 
to or slightly less effective than the current 
keys, but they accomplish this by using far 
fewer items. In the extreme example, the 
painter Experimental Key II scores only 
eight responses, yet it achieves a respectable 
separation between painters and tradesmen- 


TABLE 4 


COMPARISON OF CURRENT KEys WITH “REFINED”’ 


Current 


SD Overlap 
(% 


39.86 
r 37.41 
riIG : 25.66 


Original 


Cross-validation 


Number ¢ 


titems 


Keys 


Keys 


Experimental [ Experimental II 


M SD Overlap M SD Overlap 


(%) ( 


14.41 
14.69 


9.33 


19.79 
19.66 


9.53 
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in-general which holds up under cross-valida- 
tion. 

This decrease in the number of responses 
scored is quite an advantage. The MVII is a 
long inventory, 590 items, and any decrease 
in length would be welcomed by the person 
completing the inventory, and by the person 
responsible for scoring it. Perhaps more im- 
portant, the results imply that items selected 
in this manner are quite powerful. If more 
items can be found which meet the restric- 
tions of this method, and which have zero or 
only a low correlation with the items already 
selected, the experimental scales can be length- 
ened and, hopefully, the validities will in- 
crease. 

One source of legitimate concern is the reli- 
ability of these shortened scales. No direct 
data is available yet on this point, but the 
validities themselves seem reassuring. A scale 
with high validity can perhaps circumvent the 
reliability question simply because it works. 
It does separate groups. As these experimental 
scales do achieve separation between criterion 
and reference groups, separation which holds 
up under cross-validation, the question of 
their reliability does not seem crucial. Of 
course the reliabilities of these scales will be 
checked, if fer no other reason than to achieve 
psychometric closure. 

It should be noted that the specific meth- 
ods used in this study are applicable only to 
the triad item format. For other item formats, 


generalization from these methods and those 
suggested by Meehl (1950) should be con- 
sidered. 
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AUTOMATED TEACHING METHODS USING LINEAR PROGRAMS 


ARNOLD 
f Engineering, University 


186 freshman en; 


teaching methods. Multiple choice 


machines in individual booths, 


neering students studied elementary 
teaching 


Iree-response 


ROE! 


] 


of California, Los Angele 


probability by different 


machines, free-response teaching 


teaching machines in a classroom, 


programed textbooks requiring overt responses and providing correct answers, 


programed textbooks requiring no overt responses, “programed” 


lecturers, and 


standard lecturers were compared. No significant differences were observed b 


tween the performance of the students learning by any of the programed ma 
chine, programed textbook, or programed lecturer methods: all of the pro 


gramed methods were significantly better than the standar 


Che Automated Learning Research Project ° 
in the Department of Engineering, University 
of California, Los Angeles, is directed toward 
the investigation of the basic properties of 
auto-instructional systems. 

The research program is divided into sev- 
eral phases. The primary goal of the first 
phase was to develop a high-quality teaching 
program for use as a test vehicle in obtaining 
data in subsequent experiments. As part of 
this phase, a pilot study was conducted dur- 
ing May 1960, in which 51 freshman engi- 
neering students were taught the elements of 
probability by various auto-instructional and 
lecture techniques. The pilot study provided 
a check on the comprehensibility, reliability, 
and validity of the programed instructional 
tests, the posttests, 
and the subjective questionnaires used in the 
subsequent experiment. Experimental control 
and computational techniques were also de- 
veloped during the pilot study. Assumptions 


on normality and 


the screening 


variances 
were verified at this time (see Roe, Bishop, 
Kirk, Massey, Moon, Tarter, & Weltman, 
1960) 
One of 


in the re 


homogeneity of 


the objectives of the second phase 
irch program determine 
presenting 
programed course material had meas- 
different effects on student perform- 
url learning and on 


was to 
methods of 


~¢ 
whether different 


} 


linearly 


and 
conducted 


posttests, 
an experiment was 
laboration with H. W. Case, Mildred Mas- 
Veltman, and D. Leeds of the Department 
eering, University of California, Los Angeles 


illed “Teaching Svstems Research Project.” 


d lecturer. 


during September 1960 to explore this ques- 
tion. 


DESCRIPTION OF EXPERIMENT 
Subjects 


pilot study had indicated that 
r than 150 students would be required to give 
ful test of the hypotheses. Therefore, all 186 
students enrolled in the seven sections of the Fresh- 


nan Engineering 


1 sample siz 
1 power 


Laboratory Course at the Univer- 
sity of California at Los Angeles participated in this 
xperiment. These students were selected for the fo 

They had previously taken tl 
iineering (LDEF 
results of which could b 


used to divide the students into aptitude 
} 


i. 
lowing reasons: (a) 
Low 


an aptitude-type test, the 


Division Er Examination 
quarters 
») The pilot study had indicated that freshmen 
tudents had litt I 
subject 


matter 


le or no previous 


would be 
periment and that there 
between any 


knowledge 
taught in 
was very little 
previous knowledge and 
during the experiment. (c) The 
sufficiently similar to the material 
luring the first weeks of the 
Laboratory 


into 


of the 
which 
correlation 
periormance 
subject matter was 
normally taught 
Freshman Engineering 
that it could be incorporated 
requirements 

rability of 


avoided. 


Wethods of Instruction 


Two types of te iching machine 
gramed textbooks, and 
used 


two typ¢ 
One of the machines was a Skinner-Type Free 
sponse Machine (FRM), a 


the controlled present atic 


mechanical device 
yn of a carefully constr 

juence of instructional iter The other machin 

electromechanical Multiple Choice Machine 


which 


was an 
MCM) 
of in 


1utomatically advances the sequence 
makes tl 
If the stu 
machine scores th 


ind marks th 


tructional items after the student 


correct choice from three alternatives 


makes a choice, the 


wrong 
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st then make the correct response to formance on the posttests led to modifications in th 


uctional material t next item. original sequence and to elimination of some items 
a programed textbook, the student reads The revised instructional juence contained 1 
instruction, writes his response next to items. Identical items wet sed in the FRM 
ns the page to > correct response, PTR. Identical items with the addition of two 
see the next item, and so on. This correct respons for each item were used in 
amed Text requires an overt MCM. Identical items with the response included in 
provides imn 1} to the the item were used in the PTNR. The programs 
he correctn his response. Both lecturers loosely followed the same sequence of it 
tent with the currer heories for The inte i 
students 
iterial is 
yusly se- 


uch small Environments 


5% 

Thers 
provided 
use 
provided 

other room 
or stu ly with 


hearing the programs 


ext (PTNR 


Vide no 


randomly 


groups 


manner 

cturers 

or 

utomati 

sam 

e written 

1ed material in oral form 

current experiment same two 

ignated Programed turers (T 

instructor, wh t familiar 
[ ynal iten 


outline 
an ex imple ot the 
students would take, and 


} } 
ich co 


1] Matter tionnal ana ne examuinalt 


+} 


groups, 
truction on elem 1 be 
of 230 instruc- 
eveloped acct 
enumerated in Roe and Moon 
the pilot study. The “learning 
1e form of the perform » examination 
] and from these were The examinations wet 
ructional items. An an i f but the questionnaires were 
h item and of student , > autom 1 learning and lectu1 
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tionnaire for the automated learning groups asked 
for an evaluation of the automated method, mate- 
rials, and environment; the questionnaire for the 
lecture groups asked for an evaluation of the in- 
structor, the materials, and the environment. 


RESULTS 


A two-way analysis of variance (Methods xX 
Aptitude Quarters) on both the posttest scores 
and on learning time failed to indicate any 
significant interactions. With the exception of 
the students in the control group (who scored 
significantly lower on the posttest than all of 
the other groups), there was no significant dif- 
ference between the posttest scores of students 
in any group of teaching methods. However, 
the high aptitude students achieved signifi- 
cantly higher posttest scores than the stu- 
dents in the second highest aptitude quarter. 


TABLE 1 


AN ANALYsIS OF COVARIANCE 


y Environment® 


Mean 


test 


FRM & MCM versus 
\R versus T; & T 
58 13.2 
13.6 


14.1 


ARNOLD ROE 


TABLE 2 


LEARNING TIME 


Mean 
learning 
time 


Free response 
machines 

Multiple choice 
machines 

Programed texts, 


overt responses 160.6 


Programed texts 
no response 32 92.7 5.8 164.7 
Programed 


lecturers | Q 161 


1/156; F =91.3; signif 


With this in mind, an analysis of covari- 
ance was made, where the individual LDEE 
aptitude score was used as a covariate with 
each student’s posttest score and with the 
learning time required by each student. 

Subjective opinions of the students about 
the various teaching methods, indicated by 
the “liking” ratings, did not correlate either 
with their aptitudes nor with their perform- 
ances on the criterion tests. 

The average percentage of incorrect 
sponses made by students using the MCM 
and PTR (which recorded errors) was 
than 11%. This figure could be used, with 
some caution, in evaluating the difficulty level 
the 


re- 


] 
less 


(or adequacy) of 
material. 


programed teaching 


CONCLUSIONS 


For 
there 


linearly programed 
appears to be 


subject matter, 
little justification for 
preferring one method of presentation over 
another, considering the effect on the 
student 


level of 
The important vari- 
able seems to be the program of instruction 
If this has been carefully conceived, 


performance.® 


the par- 
ticular method of presenting the program does 
not significantly influence the level of student 
performance. Some of the hardware currently 
being used to display programed material 
may therefore be unnecessary, particularly if 
it takes longer for a student to complete a 

> Retention tests conducted by J. Griver 6 months 
the 
tion of approximately 
ing methods 
on retention 


after learning indicated a mean reten- 
60%, with none of 


showing a 


sessions 


significantly 
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given programed course with the device than 
with a simple printed textbook version. 

Some machine features, such as anticheat 
mechanisms and the recording of items which 
are missed, do not necessarily enhance stu- 
dent learning; rather they are convenient fea- 
tures for the experimenter who wishes to 
evaluate student performance or particular 
items of a teaching program. If the emphasis 
is on using a more or less perfected program 
for student learning, then many of the ma- 
chine features are unnecessary and may actu- 
ally impede student learning. If the emphasis 
is on improving the program, then most ma- 
chine devices currently being used could be 
improved to facilitate this task. If the ex- 
perimenter wishes simultaneously to teach 
and to improve the program, he will require 
relatively sophisticated hardware designed ac- 
cording to new concepts (see Roe, Lyman, & 
Moon, 1961). 

Experimental data indicated that the dif- 
ference between using multiple choice items 
versus recall or free-response items (the sub- 
ject of much previous dispute) and the dif- 
ference individual booth and class- 
room environments did not significantly af- 
fect student learning 


between 


Current concepts of programing material 
still depend largely on “arranging appropri- 
reinforcement” to elicit 
a specified student performance. However, the 
experiment indicated that overt student re- 


ate contingencies of 


sponses followed by immediate feedback on 


the correct response did not enhance student 


learning, but rather increased the time neces- 
sary for performing the learning task. 
While it would be imprudent to generalize 
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from the results of this series of experiments 
about all types and levels of course materials, 
and about all student ages and backgrounds, 
it appears that experimental data does not 
coincide with some of the currently publicized 
advantages of certain auto-instructional tech 
niques, particularly as applied to linearly pro- 
gramed material. This mean that 
proper programing of instructional material 
is not beneficial to the student. On the con 
trary, the program itself seems to be the im 
portant factor,* and the method or device for 
displaying the program will depend on the 
economical and environmental circumstances 
that prevail in each case. 

In conclusion, the investigators believe that 


does not 


present theories and auto-instructional tech- 
niques are inadequate to achieve the goals of 
effective individual instruction, and that a 
workable automated teaching system will re 
quire further analytical and hardware de 
velopment. 


*A recent study by K. Roe (1961) on ordered 
versus scrambled sequence of items raises some ques 


tion about the relative importance of this factor 


REFERENCES 

Bisuop, J., Kirk, W., Massey, MILprep 
Moon, H., Tarter, M., & Wettman, G. A pilot 
study: Automated learning research project. L 
Calif. Dept Rep., 1960(May), No. 60—53 

Ror, A., Lyman, J., & Moon, H. The dynamics of 
an automated teaching system. Aut 
Bull., 1961, 1(4), 16-25 

Ror, A., & Moon, H. Analysis of 
individual Automated 
1(3), 3-11 

Ror, K. V. Scrambled vs 
instructional 


Rep., 


Ror, A., 


Engng 
mated teach 


course content tor 
learning teach. Bull., 196 
ordered sequence in auto 
programs. U. Calif ept. Engng 
1961(Aug), No. 61-48 


Received July 11, 19 





Journal of Applied Psychology 
1962, Vol. 46, No. 3, 202-211 


THE EFFECTIVENESS OF KNOWLEDGE OF RESULTS IN A 
MILITARY SYSTEM-TRAINING PROGRAM’ 


L. T. ALEXANDER 


System Development Corporation, Santa Monica, California 
C. H. KEPNER,? ann B. B. TREGOE 2 


Kepner-Tregoe and Associates, Incorporated, Pacific Palisades, California 


An experiment to investigate the effects of knowledge of results (KR) on per- 
formance improvement of a man-machine, information-processing system. 4 
13-man crews were given a pretest exercise, then 12 training exercises, then a 
posttest exercise in air defense operations using their own operational equip- 
ment and a simulated air environment. 2 experimental crews received KR at a 
postexercise debriefing; 2 control crews received no KR or debriefing. The ex- 
perimental crews improved more than the control crews in all but one func- 
tion but improvement across functions was not equivalent. An inverse relation- 
ship was discovered between the operational visibility of a function and amount 
of improvement when postexercise KR was held constant. Visibility is discussed 


4 


in its role as 
training. 


This paper reports an experimental field 
study of how knowledge of results affects per- 
formance improvement of operations teams in 
a man-machine, information-processing sys- 
tem: an Air Defense Direction Center. 

The study is one of a series conducted in 


support of the System Training Program of 
the Air Defense Command. A complete de- 
scription of the program has been presented 
elsewhere (Carter, 1958; Chapman, Kennedy, 
Newell, & Biel, 1959; Goodwin, 1957) and 
will only be summarized here. However to en- 


able the reader to evaluate the behavioral 
measures used, a description of the operations 
of an Air Defense Direction Center will be 
given. 

SYSTEM 


The United States Air Force Air Defense 
Command maintains an aircraft control and 
warning network whose mission is to defend 
the United States against air attack. The net- 


‘The authors wish to thank Allan Katcher, Jack 
Jaffe, and Miles S. Rogers, who contributed to the 
planning; Richard Behan, who organized the sta- 
tistical analysis; Ida Larsen and Robert Meeker, 
who helped in the analysis of the data; and Joseph 
Fink, System Development Corporation representa- 
tive in the Air Defense Division in which this study 
was conducted. 

2Formerly of System 
Santa Monica, California 


Development Corporation, 
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an information-feedback concept in system design and system 


work is composed of a series of radar sites 
called Direction Centers, each assigned a spe- 
cific geographic area of responsibility. Within 
this area, the operations crew must maintain 
constant surveillance of the air space, identify 
all aircraft which fly within it, and take tac- 
tical action against any of these aircraft that 
prove hostile. (The current network includes 
high speed digital computers which assist in 
air defense operations. The experiment re- 
ported here was conducted in a manual direc- 
tion center in which there was no computer.) 

Surveillance is maintained by radar scope 
operators. As a blip appears on a scope, the 
scope operators must report to the plotters 
the range and azimuth of the blip and the 
time at which it first appears. They must re- 
port successive positions of the blip every 2 
minutes. 

The plotters record this information on’ 
a large transparent vertical plotting board, 
which is the common information storage and 
display facility for the entire crew. Conse- 
quently, it is important that the board show 
an accurate picture of the aircraft in the 
space around the site and that this picture be 
kept up to date and uncluttered. 

The crew must make positive identification 
of all aircraft which fly through its area of 
responsibility. In the majority of cases, identi- 
fication is made by correlating the location, 
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direction, speed, and height of a track with a 
flight plan previously filed by the pilot. If a 
track appears on the vertical board outside 
the tolerance limits of any flight plan avail- 
able, the track is declared “unknown” and 
fighter interceptor aircraft are ordered scram- 
bled. The Intercept Directors guide the fight- 
ers to the unknown aircraft. 

Since no Direction Center can conduct air 
defense by itself, its activities must be co- 
ordinated with those of the sites contiguous 
to it. The coordination is accomplished by 
lateral tellers who transmit relevant informa- 
tion from the vertical board to adjacent Di- 
rection Centers. 


TRAINING PROGRAM 


The Air Defense Command System Train- 
ing Program provides crews with practice in 
dealing with potential hostile threats. Train- 
ing exercises are designed to stress all aspects 
of air defense operations and team work is 
emphasized. 

During a training exercise, a simulated air 
picture is electronically presented directly on 
the radar scopes. Complete records are kept 
of all actions taken by the crew in defense 
against the air threat and, on the basis of 
these records, knowledge of results is pre- 
sented to the crew. At the conclusion of each 
exercise a debriefing is held. This debriefing 
consists of a discussion by the crew of the air 
situation with which it was confronted and 
of actions taken. The crew tries to identify 
problems which arose during the exercise and 
to generate solutions which will result in im- 
proved operational methods. These methods 
may be tested in subsequent exercises. 


PROBLEM 


The knowledge of results required by an 
air defense radar crew in the course of its 
training is likely to be more heterogeneous 
than is that required by the learner in an in- 
dividual learning situation. The crew is made 
up of a group of individuals, each having dif- 
ferent but interrelated tasks. Each individual 
contributes to the performance effectiveness 
of the crew as a whole. The success with 
which the crew performs its air defense mis- 
sion effectively depends upon the level of in- 
dividual skill of its members and also upon 
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their interactive skills, ie., how well they can 
work together. Both kinds of skills contribute 
to the efficiency of the crew and both must be 
learned. 
The knowledge of results presented to the 
in a postexercise report is the primary 
jective data by which both kinds 
deficiencies may be uncovered. Many 
of these may be dealt with by the individual 
operator himself or with the help of his su- 
pervisor. However, because of the interrelated 
nature of the tasks in a complex man-ma- 
chine system some of these deficiencies, espe- 
cially those involving interactive skills, con- 
stitute a problem for the crew as a whole. In 
this case they may be attacked in the de- 
briefing by group problem solving methods. 
In similar training situations it has been dem- 
onstrated that knowledge of results does pro- 
duce greater group problem solving efficiency 
(Smith & Knight, 1959), and also serves to 
sustain and guide motivation (MacPherson, 
Dees, & Grindley, 1948; Pressey, 1950; Ross, 
1927). 

There is a limit to the amount of informa- 
tion which can be assimilated by the crew. 
Although the weight of evidence indicates 
that there should be some kind of knowledge 
of results, if too much is given, confusion and 
deterioration in performance result (Bilodeau, 
1952; Bilodeau & Morin, 195la, 1951b; 
Morin & Gagné, 1951). The main problem 
seems to be how to deal with the large amount 
of information which is available. How much 
information should be given? In what form 
should it be presented? Should it be oriented 
toward the individual operator, to the crew 
as a whole, or to some subgroup? If no 
knowledge of results is given, each individual 
in the crew will tend to develop and use his 
own sources of information. Other studies 
have shown that subjects will obtain and use 
such information from the experimental situa- 
tion in a similar manner (Morin & Gagné, 
1951; Ross, 1933). Although this information 
may be quite accurate, it may lead to differ- 
ent standards of performance and different 
criteria of performance adequacy among crew 
personnel which may be inimical to the 
smooth functioning of the crew as a whole 
(Ross, 1933: Seashore & Bavelas, 1941). 

Knowledge of results could be given which 


Oi 
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is specific to the performance of each op- 
erator in his task. There is evidence that the 
more specific the knowledge of results, the 
more rapid the learning and the higher the 
level of performance reached (Ammons, 1956; 
Trowbridge & Cason, 1932; Waters, 1933). 
However, if only operator-specific knowledge 
of results were to be provided, the crew would 
have difficulty using it to identify instances 
of poor coordination between crew members. 
This might lead them to neglect the develop- 
ment of interactive skills. 

At the other extreme, knowledge of results 
could be given in terms of total group effort, 
e.g., the number of hostile aircraft inter- 
cepted. Information of this sort would prob- 
ably be so general that it would not be useful 
for uncovering specific operational problems. 

In this study, knowledge of results was or- 
ganized by air defense functions. The crew 
received a summary of how well it had done 
telling, in tactical 
action, etc. Each of these functions is com- 
posed of a complex of interrelated tasks and 


in surveillance, in lateral 


good performance requires the effective co- 
operation of almost every member of the 
crew. It was hypothesized that if knowledge 
of results was organized in this way, it would 
emphasize both individual and team behavior, 
would enable the crew to identify and im- 
prove weak areas, and would still not pro- 
overwhelming amount of detail to 
It was predicted that crews which 
function-oriented knowledge of re- 
sults would improve their performance more 
than crews which 
results. 

The specific experimental questions asked 


duce an 
consider. 
received 


received no knowledge of 


were: 
1. Will knowledge of 


function and presented for discussion in a 


results organized by 


debriefing session produce a greater increase 
in performance than practice alone? 

2. Will knowledge of results have an equally 
enhancing effect on all functions? 


METHOD 
Characteristics of the Ex pe rimental Site 


coastal di 
vision. At the time this experiment was run, the site 
was not yet incorporated in the aircraft control and 


(ACW) network ] 


However, all of 


The experimental site is located in a 


warning its elec 
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tronic and communications equipment was opera- 
tional and a full complement of personnel was avail- 
able for assignment. These personnel were experienced 
in air defense operations although they had never 
worked together as a team 


Subjects 
The entire 


was used to 


complement of operations personnel 
make up four 13-man which 
were used as subjects. Crews were equated by as- 
signing personnel on the following 
criteria: (a) the number of months of ACW experi 
ence, (6) rank, (c) skill classification, and (d) 
scores on an operations information test 
(A and C) were assigned to an experimental group 
and two (B and D) to a control group. The com- 
position of the shown in Table 1. Crew 
personnel were introduced to the System Training 
Program by lecture and a film. They 
were then given two shakedown runs on a practice 
problem. 


crews 


basis of the 


Two crews 


crews is 


means Ol a 


Experimental Design 


The experiment itself was of the pretest, posttest 
design in which performance improvement of the 
experimental group was compared with that of the 
control group 

The pretest 
each of two test 


consisted of one run per 
Thereafter, 
had two runs in each of six training exercises, 
of twelve training The 
was the same as the total of 64 


crew on 


problems each crew 
a total 
exercises pel 


pretest. A grand 


crew posttest 


Exercises were 
After 


each of the training exercises the two experimental 


exercises was run for the four crews 


, 
run twice a day, 5 days a week for 2 months 


results in a de 
two control crews received the 


crews were provided knowledge of 
briefing session. The 
same number of training exercises but did not receive 
either formal knowledge of results or debriefing 
sessions. 
After the 
was given a “unique problem” 


exercises crew 


The problem 


series of training each 
exercise 
was designed to present several stressful air defenss 
situations which had not been encountered in the 
training problems. The purpose of this exercise was 
to determine how the crews would respond to novel 
situations and to evaluate the hypothesis that system 
training with knowledge of results 
flexibility and adaptability of 
mental schedule is presented in Table 1 

The two problems selected for the tests were high- 
flights, 
contained 1 


increased the 


crews. The experi 


49 


load situations containing 42 and 4( 
tively. In addition, each problem 
“critical” flights. Critical flights are designed to 
create a difficult situation to which the crew must 
react. Examples are¢ without 
flight plans, aircraft which deviate from their flight 
Table 
test problems 


respec 


mass raids, aircraft 


plans, ete shows the characteristics of the 


information was collected by 
been trained by and 


Crew performance 
a military team which had 


worked under the supervision of the experimenters 
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TABLE 1 


CREW 


Mean 

Military rank 
No. S/Sgt & A 
No \ 
No. A 

Fechnical skill classificati 
No. of 5 0 personnel 
No el 


Mean score on ati ns te 


months of o] 


) 
<( 
2 
+ 


of 3-0 personn 


ope 


This information thre 
observation of the vertical board on which 
played all tracks processed by all the 


logs maintained by experimenter personnel 


was obtained in ways: from 
dis 
from 
who simu 


d by 


were 
crew ; 


maintaing 
simulated 
The objective criteria for 


lated adjacent sites; from logs 


menter personnel 


experi 
pilots 
completion of 
ach stage of information processing of tracks which 


who interceptor 


successful 
were used were obtained from military regulations 

In order to minimize th 
of information 
dures 


possibility of transmission 
the 


rential 


among ¢ 
utilized diffe 
rotation, coding 


rews, following proce 


were sequencing OF ex 


ercises, crew of problem numbers, 
fostering of a spirit of competition by instituting a 
of the Month” by 
experimenters and interviews during the course of 
the project indicated that 


of value was passed among the crews 


“Crew award. Observation the 


litt] information 


RESULTS 

in Table 3. The 
table shows the percentage of tracks which 
were handled correctly at each functional 
stage of information processing by the experi- 
mental and control crews in the pretest and 
posttest exercises. The results of the two ex- 
perimental problems have been combined 


The results are presented 


COMPOSITION 


Experimental 
group 


11 
| 


36.9 


In the stub of the table, the four stages of 
information processing are listed. The second 
column, V, shows the number of flights which 
were programed to appear in the area of 
responsibility of the experimental site. The 
percentages in the body of the table are based 
upon these numbers. The letter E indicates 
that performance in the particular stage of 
information processing was based upon the 
number of flights which were established as 
tracks by the crew. The method of evaluating 
performance is explained below 

The two main sources of input to the sys- 
tem are tracks which are crosstold (X-T) 
from adjacent sites, and flights which are 
picked upon the radar of the site itself 
Adequate detection of cross-told flights, Line 
2, means that information about flights which 
were crosstold from an adjacent site were 
received by the experimental site and plotted 
correctly on its vertical board. Plotting of 
radar data, Line 3, means that a blip which 
appeared on the radar of the experimental 
site was plotted on the board at the correct 


rABLE 2 


HARACTERISTICS 


A. Total number of flights appearit lar 


1. Number lateral told from 
) 


Zz on Trac 
dajacent sites 
n radar 


Number initially appearing 
B 


C. Total number of flights re 


Total number of flights critical fc 


rr achiol 


juiring lateral-tell-out 


OF 


TEsT PROBLEMS 
Test problems 


II 


40 
30 
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TABLE 3 


PERCENTAGE OF TRACKS PROCESSED CORRECTLY 


Pretest 


rimental 


Functional stag 


e of in t 
processing and data 


category : ( 
1. System inputs 

of X-T) 
Plotting (radar data) 
All inputs 


Detection 


Track establishment 
X-T flights 
Radar data 
All tracks 

rack maintenance 
Critical tracks 
Noncritical tracks 
All tracks 

. System output 

Adequate tactical action 
Adequate tactical action 
Lateral-tell-out 
Lateral-tell-out (abs.) 


abs.) 


nd 


indicated by parentheses 
» maintained correctly 


azimuth and range with the correct time of 
appearance. 

Track establishment is an extremely im- 
portant stage of information processing for 
the system. Until this point is reached, all 
the information about tracks in the area 
which is displayed on the vertical board is 
considered tentative. When a track is estab- 
lished, it is assigned a track number and be- 
comes the responsibility of the experimental 
site. Consequently, it was considered neces- 
sary to evaluate the performance of functions 
which follow track establishment on the basis 
of the number of tracks which were success- 
fully established and not on the basis of the 
number of tracks which were designed into 
the problem. After track establishment the 
site must maintain an accurate, up-to-date 
record of the track until it is disposed of in 
one way or another. In Table 3 the establish- 
ment of cross-told tracks, Line 6, and of radar 
tracks, Line 7, are considered separately. 

Information on all tracks (critical or not) 
plotted on the vertical board, must be kept 
accurate and current. This is the track main- 
tenance function. It is important because 
vertical board information is essential for 


taking tactical action and for providing warn- 


Posttest 


Control 
group 


Control 
group 


79 
55 
73 


(20) 90 
(40) 85 
60) 87 


23 (20) 65 20) 60 
35 5 60 
37) 81 


15 5 65 


® This column gives the number of flights which entered the site area of responsibility in the two test exercises The letter E 
dicates that the percentages in that row are based upon the number of flights established as tracks 


by the respective crews This 


For example, in Row 10, in the pretest, Crew A established nine critical tracks and of these, 67 
; Crew C established four and maintained 0%. 


ing to other sites. The maintenance of critical 
tracks and noncritical tracks is shown in 
Lines 10 and 11. 

Taking tactical action against hostile flights 
and disseminating warning information to 
adjacent sites (lateral-tell-out) constitute the 
two major output functions of the experi- 
mental site, Lines 14-17. Performance pro- 
ficiency in these functions was measured in 
two ways. First, on the basis of the number 
of tracts that were established, that is, on the 
number of tracts that were processed suc- 
cessfully through all preceding stages, Lines 
14 and 16. Second, on the basis of the number 
of tracks requiring such action which were 
originally designed into the problem, Lines 
15 and 17 (absolute number). 

Consider Lines 16 and 17 of the table. 
These are concerned with the lateral-tell-out 
function. Line 16 shows the percentage of 
tracks laterally told out successfully based 
upon the number of tracks which were estab- 
lished. The table shows that in the pretest, 
Crew A established 14 of the 46 flights which 
required lateral-tell-out (see Table 2). Of 
these 14, they successfully told out 28%. In 
the posttest, Crew A established 34 of the 46 
flights, of which they successfully told out 
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TABLE 4 


ANALYSIS OF VARIANCE DESIGN 


Source 


& 
~ 


Between crews 
Groups 
Residual 

Within crews 
Trials 

Sessions 
Problems 
sxP 
“XG 
SXG 
PXG 
Smarr x G 


— eK WN NS Ww 


ys G&D 


Error 


Total 


_ 
wn 


79%. These figures indicate that in the post- 
test, Crew A not only established a greater 
number of flights, but disposed of these flights 
more successfully. 

Line 17 indicates that 46 flights were de- 
signed inio the two test problems to require 
lateral-tell-out to adjacent sites. Of these, in 


the pretest Crew A successfully told out 9%, 
and Crew C, 2%. The control group crews 


successfully told out 15% and 4%, respec- 
tively. In the posttest, Crews A and C had 
improved to 59% and 65%, respectively, 
whereas the control group crews remained at 
about the same level of performance. 

From the data in Table 3 it can be seen 
that in almost every case the performance of 
the experimental crews improved while the 
performance of the control crews either re- 
mained about the same or decreased slightly. 
The only function in which this was not the 
was tactical action (Line 14). This 
deviation is probably due to the small number 
of cases tpon which the percentage was 


case 


Lased., 

An arc-sine transformation of the data was 
made in order to reduce the correlation which 
exists between the mean and variance of 
percentages. An analysis of variance was then 
performed on the transformed data. Bartlett’s 
test indicated that the variances were homo- 
geneous. Table 4 shows the sources of vari- 
ance and the degrees of freedom associated 
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with each. “Trials” refers to the four test 
exercises in the pretest and posttest sessions. 
The residual variance is that which exists be- 
tween crews in the same group. The error 
variance comes from pooled crews within 
groups. This analysis was performed sepa- 
rately for all the functions listed in Table 3. 

In this study we were primarily interested 
in the Sessions X Group interaction variance 
associated with each function. If the F value 
of this interaction for a particular function is 
significant, it would indicate that the knowl- 
edge of results provided to the experimental 
crews and denied to the control crews pro- 
duced a differential effect in the performance 
of that function from the pretest to the post- 
test. Table 5 shows the variance analysis for 
the Sessions X Groups interaction for all func- 
tions. Of the 13 functions, this interaction is 
significant beyond the 5% level in seven in- 
stances. Fewer than one instance can be ex- 
pected by chance alone. 

The unique problem presented after the 
training series was an attempt to create a set 


TABLE 5 


SESSIONS X Groups INTERACTION VARIANCE 
FOR ALL FUNCTIONS 


Stage of information 
processing and 
category of data 


System input 
Detection (of X-T 
Plotting (radar data 
All inputs 

Track establishment 
X-T flights 
Radar data 
All tracks 

Track maintenance 
Critical tracks 
Noncritical tracks 
All tracks 

System output 
Adequate tactical 

action 
Adequate tactical 

action (abs. ) 
ateral-tell-out 


ateral-tell-out 
(abs. ) 


® In every case the triple interaction \ 
nificant (Table 4); the error MS with 6 
used as the denominator of the F ratio. 
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TABLE 6 


PERFORMANCE OF EXPERIMENTAI 


FLIGHTS IN UNIQUE 


Group 


Experimental 


Control 


of situations for which the crews had not been 
specifically prepared by their training but 
which would test their ability to extend and 
adapt the new operational procedures they 
had developed. Early in the problem the 
adjacent radar site, upon which the experi- 
mental site had learned to depend for much 
of its surveillance and identification informa- 
tion and for scrambling interceptors, was sud- 
denly “eliminated by enemy action.” 

The performance of the experimental and 
control crews in the unique problem is shown 
in Table 6. The data refer only to critical 
flights. Track maintenance and track estab- 
lishment have been combined to give an over- 
all picture of the surveillance function. In 
each case, the experimental crews performed 
better then the control crews. The range of 
performance of the experimental crews was 
58% to 84%; of the control crews, 16% to 
58%. 


DISCUSSION 


The results show unequivocally that the 
experimental crews improved more than the 
control crews from the pretest to the posttest. 


The experimental crews improved in every 
function while the control 


im- 
slightly or The 
average performance gain over all functions 


35.1% 


crews either 
proved only retrogressed. 
of the experimental crews was 
Crew A and 49.6% 
the control crews was 0.1% 
4.0% 


for 
for Crew C. The gain of 
for Crew B and 
for Crew D. 


The conclusion to be drawn from these data 


AND CONTROI 


Percentage 
established 


maintained 


CREWS ON CRITICAL 


PROBLEM 


Percentage 
of tracks 

successfully 
lateral- 
told-out 

(N = 35 


Percentage 
ssful 
tactical 
action 


(N = 37 (N 19 


of tracks 
ot succe 


and 


S4 3 84 


is that knowledge of results and debriefing are 
necessary factors for improving the perform- 
ance of crews operating in man-machine sys- 
tems. If crews practice as a team, if they are 
provided with detailed, function-oriented in- 
formation about their performance, and if 
they have the opportunity to discuss and 
evaluate this information in a debriefing ses- 
sion, they will demonstrate greater perform- 
ance improvement than 
provided with practice alone. 

In view of the regularity in the data, why 
did the F test for the Sessions X Group in- 
teraction fail to reach an acceptable level of 
significance for some of the functions? The 
simplest hypothesis is that for these func- 
tions the error variance was too large due to 
the small number of tracks which had to be 
processed. The functions to be considered are 
(a) plotting of radar data, (6) the main- 
tenance of both critical and noncritical tracks, 
(c) the establishment of tracks from radar 
data, and (d) the taking of adequate tactical 
action (see Table 3). Since the experimental 
crews improved more than the control crews 
in all these functions, the hypothesis pre- 
sented is a reasonable one. 

But even if we accept this explanation, 
another question arises. Although the experi- 
mental crews improved more than the control 
crews in all functions, why was the relative 
gain not the same for all functions? For ex- 
ample, the experimental crews improved in 
tactical action by 14%; the crews 
by 11.5%. However, in telling out warning 


crews which are 


control 
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information, the experimental crews improved 
by 62% whereas the control crews actually 
retrogressed by 9.5%. 

For any particular function, one would ex- 
pect to find a relationship between amount of 
improvement and knowledge of results avail- 
able. In a training exercise there are two kinds 
of knowledge of results that are potentially 
available. First, there is the information in- 
cluded in the postexercise report. Although 
the knowledge of results thus provided can 
account for the overall superiority of the 
experimental crews it cannot account for the 
difference in relative gain among various 
functions because the report contained in- 
formation about all functions. Second, there 
is information available to the crew during 
the training exercise: that is, feedback 
tained from the performance of the opera- 


ob- 


tions functions themselves. There is evidence 
that the availability of this operational in- 
formation may account for the differences in 
relative gain. 

We have found in the data a relationship 
between the amount of improvement in a 
function and a characteristic of that function 
which we will call its visibility. By the visi- 


bility of a function we mean the availability, 
in the operating environment, of information 
about the adequacy of performance of the 
function. 


The data indicate that there is an inverse 
relationship between the visibility of a func- 
tion and the amount of performance improve- 
ment demonstrated for that function. In other 
words, under conditions of postexercise feed- 
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back those functions which provide less in- 
formation on performance adequacy during 
operations show greater improvement than 
those that provide more (see Table 7). In 
the first column of Table 7, five functions are 
ordered from high to low visibility. The aver- 
age performance gain for the experimental 
and control crews are given in the right-hand 
columns of the table. The table entries are 
the percentages of all tracks 
through each function. They were obtained 
by averaging the gains of the two experi- 
mental crews and the two control crews as 
shown in Lines 14, 8, 4, 12, and 16 of Table 
3. For example, in Line 14 of Table 3 the 
average score of the experimental crews, A 
and C, in the pretest is 48.5% and the 
posttest is 62.5%. The difference between 
these scores represents a gain of 14.0%. Cor- 
responding scores for the control crews, B and 
D, are: in the pretest, 19.0%; in the posttest, 
30.5% ; a gain of 11.5%. One can see that 
when training is provided after a training 
exercise there is an almost perfect 


processed 


in 


inverse 
relationship between degree of visibility of a 
function and performance gain. 

We have not yet developed a measuring 
scale for visibility. So far we have only ten- 
tatively identified a factors which may 
be useful in developing such a scale. These are 
expressed below in terms of questions con- 


few 


cerning the performance of a function by an 
operator. Affirmative answers indicate higher 
visibility. 
1. Are 
to him? 


the results of his actions displayed 


TABLE 7 


VISIBILITY RELATED TO AVERAGI 


Function* 


Visibility 


Tactical action (14) 


Track establishment (8 


High 


Input processing (4 
Track maintenance 
Lateral-tell-out (16 


Medium 
(12 


PERFORMANCE 


GAIN IN Five Atk DEFENSE FUNCTIONS 
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pretest to postt 
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tage 


(perce! 


nental 
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2. Are the results of his actions displayed 
directly to his supervisor? 

3. Can the person who is affected by his 
actions communicate with him directly? 

4. Can he observe the activities of the per- 
son who may be affected by his actions? 

5. Does he receive information about other 
inputs which the recipient of his actions may 
be obtaining? 

6. Is he aware of the kind and distribution 
of information needed by the recipient? 

Let us apply these questions to two of the 
functions in Table 7 and see why “tactical 
action” is a high visibility function and 
“lateral-tell-out” is a low visibility function. 

The Intercept Director performs the tac- 
tical action function. He has a radar scope on 
which are displayed the positions in space of 
the interceptor track and the target track. 
Not only are the results of his vectoring in- 
structions immediately apparent to him but 
they are also available to his supervisor, the 
Senior Director. Indeed, since they are dis- 
played on the board the whole crew can see 
them. Thus, the results of all messages the 
Intercept Director transmits and all actions 
he takes are immediately visible to him, to his 
supervisor, and to his fellow crew members. 

The lateral teller, on the other hand, does 
not have such information. He transmits 
messages by phone to a plotter in another 
direction center. The lateral teller does not 
know whether the information he transmits 
was plotted correctly (even though it may 
have been repeated correctly verbally). He 
cannot plan ahead so as to pace his transmis- 
sions because he cannot foresee how busy with 
other inputs the plotter at the other end may 
be. There is no one to monitor the perform- 
ance of the teller-plotter combination and 
provide immediate corrective feedback; the 
supervisor of each is in a different direction 
center. Knowledge of results may come to 
the teller only after a long delay and usually 
in the form of a rather general description 
of failure of coordination between the 
sites. 

It seems clear why there should be an in- 
verse relationship between the relative per- 
formance gain and the visibility of a function. 
In a man-machine system, where all functions 
are interrelated and the effects of the behavior 


two 
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of each individual proliferate widely, per- 
formance effectiveness of a crew depends upon 
the development of two kinds of skills: com- 
ponent skills, which relate to how well a man 
performs his own particular job; and inter- 
action skills, which relate to the mutual effect 
of the performance of crew members upon one 
another. If a function is visible, the operator 
performing it is able to see the consequence 
of his actions upon the performance of an- 
other crew member. Therefore it is possible 
for him to learn to adjust his behavior in 
accordance with the task requirements of the 
other man. If a function has low visibility, 
the operator will not receive information 
about the adequacy of his performance from 
the operational situation itself. If he is to 
improve at all it must be on the basis of 
knowledge of results provided in some other 
way. In this experiment, the “other way” was 
the postexercise report. 

Improvement of performance in high-visi- 
bility functions is not necessarily dependent 
upon the presence or absence of supplemen- 
tary knowledge of results. Improvement in 
low-visibility functions is so dependent. Since 
only the experimental crews received post- 
exercise knowledge of results, we would expect 
the difference in performance gain between 
the experimental and control crews to be 
greater with low-visibility functions. 

The results of the unique problem exercise 
merit separate discussion. System training 


exercises incorporate a wide variety of prob- 
lems which stress all aspects of air defense 
operations. An attempt is made to simulate 
situations which are likely to be encountered 
in wartime. However, it is not possible to fully 


anticipate the strategy and tactics which 
might be used by a potential invading force. 
Consequently, it is necessary to train crews 
not only to deal with situations which can 
be anticipated but also to respond effectively 
to novel, unpredictable situations. 

The experimental crews performed much 
better than the control crews in the unique 
problem exercise. This fact supports the sup- 
position that knowledge of results and de- 
briefing contribute to the development of 
crew adaptability. In debriefing the crew 
learns to identify and evaluate the reasons 
for their mistakes. They develop remedial 
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procedures which they can try out. Each 
man discovers how his job relates to all the 
others in the complex system. In the debrief- 
ing, using the knowledge of results, they learn 
how to conduct air defense as a team and also 
how to solve air defense problems. They learn 
how to learn. This is manifested in opera- 
tions by rapid and accurate evaluation of the 
nature of a threat, whatever it might be, and 
by the initiation of an adequate and timely 
response to it. This is what we mean by crew 
adaptability. 

Two additional points should be mentioned. 
First, no attempt was made in this study to 
evaluate the relative importance of feedback 
of performance data and the discussion of 
these data in a debriefing session. This will be 
done in a later experiment. Second, considera- 
tion should be given to the possibility of 
designing or redesigning increased visibility 
into a system function so that more feedback 
is available within the operating situation. 
Besides the gain in operator performance 


which has been demonstrated, designing for 
functional visibility has important implica- 
tions for the theory and practice of system 


management. 
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An experimental interest test, yielding scores on 5 homogeneous scales, was ad- 
ministered to a sample of recruits on their 3rd day in the Navy. Recruits were 
assigned to naval school training, on the basis of expressed interest and apti- 
tude test scores, by classification personnel who did not have access to interest 


test scores. Follow-up results are 
each of the 51 schools, 
cantly higher than the 


lation. For 41 schools, 


reported for 19,147 recruits assigned to 51 
schools, each of which had a related scale on the interest test 


For students in 


the mean score on the related interest scale was signifi- 
corresponding mean score for the general recruit popu- 
the related interest scales had statistically significant 
predictive validity against a school grade criterion 


Related interest scales con- 


tributed significantly to operational aptitude tests in predicting school success 


While considerable research has been per- 
formed relating interest inventory scores to 
academic and professional criteria, very little 
work has been done relating measured inter- 
ests to subsequent choice of or training suc- 
in technical or trade school programs. 
Patterson (1956), in his review of the litera- 
ture in the prediction of success in trade or 
vocational courses, indicated that almost 
work had been done with interest tests. Some 
studies in military settings have re- 
ported. One by Brokaw (1956), using inter- 
est keys developed by clustering items on the 
basis of their correlation with aptitude index 


cess 


no 


been 


scores, obtained generally nonsignificant pre- 
dictive validities for 13 Air Force technical 
schools. Schweiker (1959) reported that in- 
terest scores, used in conjunction with the ap- 
propriate Air Force Aptitude Index, did not 
make a stable contribution to the multiple 
correlation. Kurz (1952) obtained significant 
relationships between Minnesota Vocational 
Interest Inventory scores and success in two 
Naval Air Technical Training programs. Mayo 

1 This paper is based on the senior author’s pres- 
entation during the First Symposium Defense 
North Atlantic Treaty Organization 
Headquarters, Paris, France, July 27-29, 1960 
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and Thomas (1956) found Minnesota Voca- 
tional Interest Inventory scores to be related 
to subsequent vocational choice for five Naval 
Air Technical Training programs. In a fol- 
low-up study (Naval Air Technical Training 
Command, 1956), in four of these five pro- 
grams significant relationships were found be- 
tween interest inventory scores and course 
grades. 

The present paper reports a broad scale, 
longitudinal investigation of the relationship 
of the recruit’s measured interest to his sub- 
sequent choice of Navy vocational training 
and to his success in such training. The in- 
terest inventory used was composed of items 
directly reflecting the tasks performed in the 
majority of the programs for which predic- 


tions were to be made. This inventory was 
administered to recruits shortly after their 
entry into the service. Interest scores were 


related to subsequent choice of training pro- 
gram and eventual performance in 57 differ- 
ent schools or curricula. 


METHOD 

The 
The interest test used, the Navy Activities Prefer- 

Blank (NAPB), was specially devised for pur- 
of the present investigation. The NAPB in- 
corporated 5 of 8 factors which emerged from a fac- 
of 300 tasks 
Navy specialties (Gordon 
factors included 
Mechanical, 

Two factors, 
the 


Instrument 


ence 
poses 
tor analysis performed in 58 different 
& Anderson, 1962) The 
were: Clerical, Electrical-Elec 
Medical-Dental, and Hazardous 
Navigation and Service, were not 
blank for 


five 
tronic, 
Duty 


introduced into since quotas schools 
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TABLE 1 


INTERCORRELATIONS AMONG NAPB SCALES 


Scale EE HD M MD 
C 

EE 

HD 

M 


MD 


Note 
and ir 
Hazar 


rhe following NAPB 
other tables: (¢ Cler 
1 Duty Me 


ous 


HI 
tal 


small. A third factor, 
items defining the 
had high loadings on factors already 
included in the blank 

The NAPB consisted of 4( 


ferent 


representing these factors were 
Aviation, was excluded since fac 
tor generally 
three dif 
factors being represented in each triad, and 
factor being matched 
an equal number of times. A 
sented below 


triads, with 


factor 
triad is pre 


every with every other 


specimen 


1. Give shots with a hypodermic needle 
2. Make a record of court martial proceedings 
Repair or overhaul pumps 


The respondent was asked to 
the three tasks he liked the 
the three tasks he disliked the 
task was scored its factor, 
was scored ( lactor, 


on its factor 


which of 
which of the 
Each “liked 
“disliked” task 
ind each unmarked task 
Since there 24 items 
scores on each scale could have 
Kuder-Richardson (Formula 20) 


intercorrelations ar¢ 


indicate 
most and 
most 
on each 
on its 
was scored 1 were 
scale, values 
48 


and scale 


in each 
irom O to 
abilities 
Table 1 


reli- 


presented in 


Procedure 


The Navy 
ministered to 


Blank was ad 
the Navy during 
a total of 105 
The blank was given during the 
classification testing on the afternoon of the 
recruits’ third day in the Navy. Completed blanks 


were l 


Activities Preference 
all entering 
a 15-month period of time. In all, 
recruits were tested 
regular 


recruits 


819 


collected by research personnel and were not 
used in subsequent classification decisions 
A classification interview was held with 
cruit during his third week in the Navy 
cruit’s aptitude test scores, background information, 
ind interest all considered, in the 
light of school quota requirements, in arriving at a 
mutually agreeable decision as to the train 
program to which he would be The 
recruit could not be assigned to a program that was 
unacceptable to him. In attempt 


made to place the program of 


each re 
The ré 
€ xpressed were 
type ol 
ing assigned 
general, an 
the 


was 
recruit in his 
choice 

Of the total sample tested, 50,916 re 


signed to a 


cruits were as 


school training program. The remainder, 
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for the most part, did not 
quirements for school training 
constituted the pool from 
present study were drawn 

The criterion used 
sisted of final grades achieved in 
Visits were made to each 
information which would permit selection of 


meet aptitude test re 


The 


which 


assignees 


the 


school 
samples for 
NAPB 
training 
obtain 


the 
< hool 


for validating con 


school in order to 
valida 
tion samples which were uniform in curriculum anc 
grading procedures 

Fifty-seven school samples were selected for analy- 
10 
adequate criteria were included in 
For schools having very larg 


or having in- 
the 


samples drawn 


sis Schools having less than cast 


not analysis 
inputs, 
for analysis were restricted to about 600 cases each 
means and standard devia 


the 


Pearson 


For each school sample, 
tions obtained 
Validities in the form of 
coefficients of correlation were 
NAPB scale and the 
reference 
on each 


were for each of interest scales 
product-moment 
obtained between each 


For 


standard deviations 


final school grade criterion 


purposes, means and 
interest 


dom sample of 50 


scale were also obtained for a ran- 


school eligible recruits 


RESULTS 


The items constituting the 
flected tasks performed in 


NAPB scales re- 
particular Navy 
taught in Navy 
out of the the 
present study, it was possible to identify some 
single NAPB scale whose items most closely 
described the type of tasks taught in a par- 
ticular school. 


and 
S¢ hools. For 51 


specialties particular 


57 schools in 


reflected what was taught 
schools, the electrical items on the Elec- 
trical-Electronic scale corresponded closely to 


The Clerical scale 
in § 


tasks performed in 7 schools, while electronic 
items on this scale described tasks performed 
the Mechanical scale best de- 
scribed what was taught in 14 schools, while 
the Medical-Dental scale reflected tasks per- 
formed in 5 schools. Tasks described in the 
Hazardous Duty scale were not performed, in 
any significant number, in any of the schools 
in the present investigation.* 


in 17 schools: 


Tasks performed in the six remaining schools 
were best described by items on the Naviga- 
tion factor, which was not incorporated in the 
NAPB. These schools all of the 


were part 


To simplify discussion, different curricula 
sented at the same 
ricula presented at 


pre 


school location and the same cur 


different geographical locations 


are each treated as different “schools.” 
' The Duty 
validated for submarine, 


ind similar program 


Hazardous scale will be separately 


underwater demolition team, 
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TABLE 2 
MEANS AND STANDARD DEVIATIONS OF 


INTEREST SCALES FOR 
SCHOOL SAMPLES AND FOR 


4 GENERAL RECRUIT SAMPLE 


M 


School 


Clerical 
Aviation storekeeper (AK) 


ns technician (CTA) 
PN) 
nan PN) 


Sk) 


AB) 
ADR 
ADR) 
ADJ 
(ADJ) 
hanic (AMH) 
\MS 
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Naval Aviation Technical Training program 
and are identified as Miscellaneous Aviation 
schools. 


For purposes of data presentation and dis- 


cussion, school samples have been clustered 
as previously described. All schools within a 
cluster are somewhat similar in terms of tasks 


OF MEASURED INTERESTS 


taught, and with the exception of the Miscel- 
laneous Aviation schools are related to the 
same NAPB scale. 

Table 2 presents means and standard devia- 
tions of NAPB scales for all 57 schools as 
well as for the general recruit sample. Except 


for the Miscellaneous Aviation schools, the 


TABLE 3 


PREDICTIVE VALIDITIES OF INTEREST 


School 


Clerical 
Aviation storekeeper (AK 
Communications technician 
Personnel man (PN 
Personnel man (PN 
Storekeeper (SK 
Storekeeper (SK 
Yeoman (YN 
Yoeman (YN 


Electrica 

ins mate (AEM 
(AEI 
uction electrician (CE 
tricians mate (EM 

(EM 


ymmunications 


Aviati 


Aviation electricians mate 


’ ] ‘ 
on e1ectrick 


Constr 
Ele 
Ele 4 


Interior c 


triclans mate 
IC 


Interior communications (I¢ 


Ele 
Aviation electronics technician (ATN 
Aviation ynics technician (ATR 
Aviation fire control technician (AQI 


tronic 
electri 
Aviation fire control technician (AQB 


missileman (GFA 


(CTM 


Aviation guided 
Communications technician 
Electronics technician (ETN 
Electronics technician (ETN 
Electron rR 
Electronics tec hnician (ETR 


Electronics technician (ETS) 


ics technician (I 


Fire control technician (I'l 
Fire control technician (f'] 
Fire control technician (I'] 
Radarman (RD 
Radarman (RD 


Ssonarman (SO 


SCALES 


FOR 57 Navy SCHOOL SAMPLES 


Validities 
ncorrected 


HD 


»N NN bh 
Co i ~! 
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Nm Ww 
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School 


Mechanical 
\viation boi 
Aviation machinist’s mate (ADR 
\viation machinist’s mate (ADR 
Aviatic ate (ADJ 
Aviatior ite (ADJ 
\viation structural mechanic (AMH 
\viation structural mechanic (AMS 
in (BI 
Boilerman (BT 
Builder (BI 
in (EN 
Engineman (EN 
Machinery repairman (MR 
Machinist’s mate (MM 


itswains mate (AB 


n machinist’s 1 


machinist’s m¢ 
Boilerr 


Enginem: 


Medical-Denta 
Dental technician (DT 
Hospitalman (HM 
Hospitalman (HM 
Hospitalman (HM 
Hospitalmar HM 


Miscellaneous 
rapher mate (AG 
ACW 

man (ACR 
trolman (ACT 

r’s mate (PHA 
mate (PHG 


Aviation 


rolmar 


significance of the difference between each 
school mean on its related NAPB scale and 
the corresponding mean for the general re- 
cruit sample is indicated. In addition, for all 
chools, other means which are significantly 
higher than the 
noted 


general recruit mean are 

lor every one of the 51 schools having a 
related NAPB scale, the related NAPB mean 
is significantly higher than the corresponding 
mean for the general recruit sample. Thus, on 


the average, individuals assigned to a particu- 


lar school are significantly more interested in 
performing the taught in that 
recruits in general. In 
most instances, these differences are not only 


tasks school 


than are school eligible 
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Continued 


Validities 


Uncorrected 
Corrected 


HD ] ( 


Statistically significant but are substantial in 
magnitude. The most striking differences are 
found for the Medical and Dental 
where a positive preference for performing 
medical-dental tasks is exhibited by the av- 
erage Medical and Dental school assignee in 


sche it ys, 


contrast to a decided aversion on the part of 
the average recruit. Compared with the gen- 
eral recruit sample, small but significant mean 
differences are found in favor of the Clerical 
schools on the Medical-Dental scale, and in 
favor of the Medical-Dental and Miscellane- 
ous Aviation Clerical 
There is tendency for Mechanical 
schools to have higher means than the gen- 
eral recruit sample on the Hazardous Duty 


schools on the scale. 


some 
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VALIDITY OF 


MEASURED INTERESTS 


TABLE 4 


ETWEEN INDIVIDUAL CLA 


Classification bat 


General! Classificatior 
\rithmetic Aptitude Test 
Mechanical Aptitude Test 
Cleric \ptitude Test (CLER 
Electronic 


rechnician Select 


scale. This undoubtedly occurs because most 
of the hazardous duty items involve mechani- 
cal tasks. 

Validities, in the form of product-moment 
correlation coefficients between interest scores 
and the school grade criterion, are presented 
in Table 3. For 42 out of the 51 schools hav- 
ing a related NAPB scale, the related scale 
has the numerically highest positive validity 
Statistically significant were ob- 
tained on related scales for 5 out of 8 Cleri 


validities 


Electrical schools, 15 
schools, 11 out of 14 
Mechanical schools, and 4 out of 5 Medical- 
Dental As previously indicated, the 
Miscellaneous Aviation schools had no coun- 
terpart scales on the NAPB. 

In addition, the Mechanical scale has sig- 
nificant me¢ 


cal schools, 6 out of 7 
out of 17 Electronic 


SC hools. 


gative validity for Clerical schools 
in 7 out of 8 cases and for Medical-Dental 
Gordon and Steinemann (1961) present 
supporting information 
preferences of the new recruit 


some 


relevant, regarding vocational 


rPABLE 


IFICATION BATTERY TESTS AND NAPB SCALI 


NAPB scales 
HD 


01 


schools in 4 out of 5 cases. In a complemen 
significant negative 
are found for Mechanical 
and Medical-Dental scales, the in 
cidence being 10 out of 14, and 9 out of 14 
respectively. The Hazardous Duty key also 
exhibits negative validity for Medical-Dental 
SC hools. 

Since interests, as measured by the NAPB 
appeared to be related, in some manner, to 
type of 


tary manner, validities 
schools on the 


Clerical 


school assignment, a statistical cor 
rection was made, for related scales, to esti 
mate the validities that 


cruits assigned to 


would occur if re 
were not schools on the 
basis of their variables related 


thereto. The validity coefficients, corrected for 


interest or 
restriction in range, are presented in the 
right-hand column of Table 3. 

Correlations between interest scales and in 


dividual aptitude tests, based on a representa 


tive general recruit sample of 500 cases, are 
presented in Table 4. It noted that 
modest positive relationships exist betwee 


may be 
similarly named measures of interest and apti- 


5 


CORRELATION BETWEEN TEST COMBINATIONS USED FOR 


) SCHOOLS A 


Clerical GCT and CLER 
Electrical GCT and MECH 
Electronic GCT, ARI, and ETS1 
Mechanical \RI and MECH 
Medical-Dental GCT and ARI 


nD NAPB ScALt 


NAPB scales 


HD 
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rABLE 6 
MULTIPLE 


SCHOO! 
NAPB 
SCHOOL SAMPLI 


ApTITUDE TESTS 
APTITUDI 


SCHOO! 


CORRELATIONS OF WITH 
RELATED 
WITH ALL 


, AVERAGED BY TYPE OF SCHOOI 


GRADES, AND OF AND 


SCALE GRADE FOR 


Type of school 


Clerical 
Electrical 
Electronic 
Mechanical 
Medical-Dental 


Note nly 
included in ea 
—* Using unc 
is .64. 


hose aptitude tests acti 


ch multiple correlation 
wrrected NAPB val 


tude in the Clerical, Electronic, and Mechani- 
cal areas. The correlations between interest 
scales and the aptitude test combinations 
(sums of tests) actually used in assignment to 
each type of school, are presented in Table 5. 
For each type of school, the correlation for 
the related interest scale is boldfaced. 

In view of the significant relationship be- 
and 
the low relationship between interest and apti- 
tude measures, the contribution made by in- 
terest scores to the predictive effectiveness of 
the aptitude composites used in selection was 


tween interest scores and school success, 


determined. For simplicity of presentation, 
this information is summarized, utilizing the 
previously established school clusters. In as- 


the contribution of interest 
only the single related NAPB scale was con- 
sidered. 


sessing scores, 


For each school cluster, the average multi- 
ple correlation between the aptitude tests used 
in selection and school grades is presented in 
Table 6. The multiple correlations were based 
on validities which had been corrected for re- 
striction in range due to direct selection. The 
average multiple correlation resulting when 
the related interest scale is added is also pre- 
sented in Table 6. It may be noted that on 
the average from .01 to .04 is added to the 
multiple correlations by the addition of the 
related interest scale except for Electronic 
schools. All increments are statistically sig- 


nificant at the .01 level of confidence. 
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DISCUSSION 


The results of the present analysis suggest 
that the recruits’ interests, as measured upon 
their entry into the service, are related to the 
choice of training programs made by t! 
some 3 weeks later. This relationship is re- 
flected by the significantly higher average 
scores made on the relevant interest 
for students who select or agree to accept, 
and are assigned to particular training pro- 
grams. The degree to which measured inter- 
est is related to subsequent vocational choice 
is particularly striking in the Medical-Dental 
area, where, on the average, a positive liking 
for medical and dental tasks is expressed by 
school assignees, as contrasted with a definite 
aversion for such tasks recruits in 
general. 


em 


scales 


among 


findings indicate that strength of 
measured interest is predictive of subsequent 
success in vocational training, even for indi- 
viduals who are selected on factors related to 
measured interest (such as expressed inter- 
est and relevant background variables). The 
magnitude of the relationships for clerical, 
mechanical, electrical, and electronic special- 
for the medical spe- 
cialty, the relationship is substantial. 


Present 


ties is modest. However, 


Validities, corrected for restriction in range, 
provide an estimate of relationships which 
would obtain if individuals had not been as- 
signed to schools on the basis of factors re- 
lated to measured interest. The magnitude of 
these validities argues for the importance, in 
military classification, of considering the in- 
dividual’s interest. findings have im- 
plications for the consideration of interest 
measures in guidance for civilian trade train- 
ing programs as well. 


These 


The uncorrected and corrected validities of 
the Medical-Dental for medical and 
dental are interesting. In the 
general recruit population, there is a strong 
aversion toward performing medical or den- 
tal Under present assignment pro- 
cedures, individuals assigned to medical and 
dental schools have interests generally rang- 


scale 


schools most 


ache 
tasks. 


ing from indifferent to enthusiastic. Thus, on 
this scale, the variance for the general popu- 
lation is substantially smaller than that for 
the selected group, rendering the application 
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of the formula which corrects the validity for 
restriction due to selection of questionable 
appropriateness. In actual practice, if inter- 
ests were not considered in school assignment, 
the vast majority of individuals assigned to 
these schools would score very low on the 
Medical-Dental scale, and attrition could con- 
ceivably reach a point where the schools 
would be unable to fulfill their function. 

The intercorrelations among the interest 
scales, the patterning of means, and the pat- 
terning of negative and positive validities ob- 
tained for the Clerical, Medical-Dental, and 
Mechanical schools, are reminiscent of the 
“interest in ‘clean hands’ versus ‘dirty hands’ 
occupations” factor identified Zachert 
(1952). 

The present interest test was composed ex- 
clusively of descriptions of tasks performed 
in the majority of the training programs for 
which predictions were to be made. Where 
items in the test did not substantially reflect 
tasks performed in particular schools, validi- 
ties were nonsignificant or nonconsistent. For 
example, none of the scales included any sub- 
stantial number of tasks performed in the 
miscellaneous aviation schools. The absence of 


by 


related tasks for this group of schools appears 
to have resulted in an “indifference pattern” 


in the NAPB means. Although, for these 
miscellaneous aviation schools, some positive 
validity may be noted for the Electrical-Elec- 
tronic scale, paradoxically, school means on 
this scale are lower than the corresponding 
mean for the general recruit sample. By way 
of further example, two other types of schools 
which did not have adequate task representa- 
tion on the NAPB had nonsignificant validi- 
ties on “related” scales. These were Person- 
nel Man schools, where emphasis is on tasks 
other than those characteristic of the Cleri- 
cal scale; and Builders school, where carpen- 


OF MEASURED INTERESTS 


219 
try tasks, minimally represented in the Me- 
chanical scale, are taught. 

The foregoing suggests that the nature of 
the items that make up an interest inventory 
may circumscribe the applicability of that in- 
ventory to certain groups of occupations. 
This may very well occur even in instru- 
ments which are not composed primarily of 
task-type items. It would therefore appear to 
be advisable not only to explore and report 
areas of applicability for particular interest 
inventories, but to specify their occupational 
“blind spots” as well. 
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RESPONSE SETS AND FACTOR LOADINGS ON 
SIXTY-ONE PERSONALITY SCALES' 
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Intercorrelations between 58 MMPI and 3 other personality scales, based upon 


the scores of 151 students, were 


proportion of items keyed for 


zero-order correlations of the scales with the Social Desirability (SD) 


socially 


factor analyzed and the factors rotated or- 
thogonally. Loadings of the scales on the Ist 


factor correlated .90 
desirable responses and .98 


with the 
with the 


scale 


The proportion of keyed True items correlated .82 with the loadings of the 


scales on the 2nd factor. The Lie and 


had substantial loadings on the 


the tendency te 
answers. 


acquiesce, and the 


It has been shown that the probability of 
a True response to a personality item can be 
predicted quite accurately from a knowledge 
of the social desirability scale value of the 
item (Cowen & Tongas, 1959: Edwards, 1953, 
1957a, 1957b; Hanley, 1956). If the trait or 
behavior represented by thé content of an 
item can be judged as socially desirable or 
undesirable, then it is also possible to classify 
a subject’s response to the item as socially 
desirable or undesirable. A socially desirable 
response is defined as a True response to an 
item with a socially desirable scale value or a 
False response to an item with a socially un- 
desirable scale value. A socially undesirable 
response is defined as a True response to an 
item with a socially undesirable scale value 
or a False response to an item with a socially 
desirable scale value (Edwards, 1957b). 

A 39-item scale designed to measure the 
tendency of subjects to give socially desir- 
Ed- 
(1957b). The scale is called a Social 
Desirability (SD) scale and all of the items 
in the SD scale are keyed for socially desir- 
able should be obvious, how- 
ever, that the concept of an SD scale is not 
limited to the particular one developed by 
Edwards. Any personality scale in which all 
of the items are keyed for socially desirable 
may SD scale, 


able responses has been developed by 
wards 


responses. It 


response be described as an 

1 This research was supported in part by Research 
Grant M-4075 from the National Institute of Men 
tal Health, United States Public Health Service 


3rd as reflecting the tendency to 


3 other scales similar to the Lie scale 
3rd factor. The Ist factor is interpreted as re- 
flecting the tendency to give socially desirable responses, the 2nd 


as reflecting 
falsify 


the manner in which the scale 
was developed or the specific personality trait 
which the scale supposedly measures. For ex- 
ample, if the scoring key for a scale designed 
to measure the trait of Dominance is such 
that all of the items are keyed for socially 
desirable responses, then this scale may also 
be described as an SD scale. If the Dominance 
in fact, another form of the SD scale, 
then scores on the scale should correlate posi- 
tively with those on the SD scale. Similarly, 
if all of the items in a given personality scale 


regardless of 


scale is, 


are keyed for socially undesirable responses, 
then scores on this scale and the SD scale 
should correlate negatively. 

(1957b) has argued that the 
tendency to give socially desirable responses 
in self-description, as measured by the SD 
scale, is a general trait reflected in scores on 
a wide variety of personality 
scales. If a subject has a strong tendency to 


Edwards 


True-False 


give socially desirable responses, as measured 
by a high score on the SD scale, and if this 
is a general trait, then he should also obtain 
relatively high scores on all other personality 
scales in which the items are keyed for so- 
cially desirable responses and low scores on 
those scales in which the items are keyed for 
socially undesirable responses. If a subject 
SD scale we might re- 
gard this as indicating a weak tendency to 


has a low score on the 


give socially desirable responses and we might 
expect such subjects to respond more to the 
content of an item independently of the item’s 
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social desirability stimulus value. We may 
also argue, however, because of the nature 
of the items in the SD scale, that low scores 
represent a strong tendency to give socially 
undesirable responses, and that this is also a 
general trait. If this is the case, then subjects 
with low scores on the SD scale may be ex- 
pected to obtain low scores on other person- 
ality scales in which the items are keyed for 
socially desirable responses and high scores 
on those scales in which the items are keyed 
for socially undesirable responses. 

The above argument that if we 
intercorrelate a large number of personality 
scales with varying proportions of items keyed 
for socially desirable responses and factor ana- 
lyze the resulting correlation matrix, we would 
obtain a bipolar social desirability factor. 
Scales which have a large proportion of keyed 
socially desirable responses should have load- 
ings on one pole of the factor and scales 
which have a large proportion of keyed so- 
cially undesirable responses should have load- 
ings on the opposite pole of the factor. More 
specifically, we should be able to predict the 
sign and magnitude of the loading of a scale 
on the bipolar factor from a knowledge of 
the social desirability keying of the scale. 
Since the SD scale is assumed to be a rela- 
tively pure measure of the social desirability 
factor, the loadings of the scales on the bi- 
polar factor should also be directly related to 
the zero-order correlations of the scales with 
the SD scale. 

If a personality item has a neutral social 
desirability scale value, then the socially de- 
sirable response to the item is undefined. It 
has been suggested by Hanley (1956) and 
Edwards (1957b) that scales which contain 
a substantial proportion of cutral 
should be less subject to thy ence of so- 
cial desirability tendencies. stive, but 
limited, evidence on this po, s been ob- 
tained by Gocka (1960). If 1 is true that 
as the proportion of neutral items increases 
scores are less influenced by social desirability 
tendencies, then we should expect scales with 
a large proportion of neutral items to have 
lower absolute correlations with the SD scale 
and also lower absolute loadings on the bi- 
polar factor than scales with a small propor- 
tion of neutral items. 


suggests 


items 


SCALES 


The present study was undertaken to de- 
termine whether, as suggested above, a social 
desirability factor would be obtained in a fac- 
tor analysis of True-False personality scales 
and whether the loadings on this factor would 
be related to the correlations of the scales 
with the SD scale and the proportion of keyed 
socially desirable responses in the scales. In 
addition, we wished to determine whether the 
absolute values of the social desirability fac- 
tor loadings and of the correlations of the 
scales with the SD scale would be negatively 
related to the proportion of neutral items in 
the scales. 

METHOD 

The Minnesota Multiphasic Personality Inventory 
(MMPI) was administered under standard instruc- 
tions to 151 male students at Western Washington 
College of Education.* In addition, all students were 
given the Couch and Keniston (1960) 
sponse Set (ARS), the Crowne and Marlowe (1960) 
Social Desirability (MC-SD) scale, and an experi 
mental Forced Choice Social Desirability (FC-SD) 
scale. The MMPI records were scored for 58 MMPI 
scales. Intercorrelations between all 61 
obtained and the resulting 


Agreement Re- 


scales were 
matrix was 
factor analyzed by the principal components method 
Ten factors were extracted accounting for 75% of 
the total variance. The 10 were rotated or 
thogonally according to a criterion which maximized 
the variance of the squared factor loadings for each 


successlv¢ 


correlation 


factors 


simple structure approximation vector 
We are concerned in this study, only with the first 
three rotated factors. These three factors account for 
38, 10, and 8% of the total variance and 43, 
9% of the common variance, respectively 
Social desirability scale values for the MMPI items 
have been obtained by Heineman (1953) and are re- 
produced in Dahlstrom and Welsh (1960). Using 
Heineman’s scale values, the proportion of items 
keyed for socially desirable responses in each of the 
58 MMPI scales was obtained.* In determining these 
proportions, we arbitrarily considered all items fall- 
ing beyond the midpoint of the neutral interval on 
the socially desirable end of the psychological con- 
tinuum as having socially desirable scale values and 


18, and 


2 The test records were obtained through the co- 
operation of Charles Harwood of Western Washing 
ton College of Education 

3 The principal axis factor analysis and rotation 
programs for the IBM Type 709 are described in 
mimeographed reports by Nussbaum, Marshall, and 
Roosen-Runge (1961) and George Burket (1961) 

* Successive interval social desirability scale values 
of MMPI items have been obtained by Messick and 
Jackson (1961), but were not available at the time 
this research was undertaken. The Messick and Jack 
son scale values correlate .964 with those obtained by 
Heineman (1953) 
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TABLE 1 


ROTATED FACTOR LOADINGS 


Scale III 
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18 Fm 18 
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22 Hy-O .06 
23 Hy-S O01 
24 Te 19 
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26 K .06 
27 L — .65 
28 Lp —.14 
29 Ma —.25 
30 Ma-O —.10 
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all those on the other side of the midpoint as hav- 
ing socially undesirable scale values. The items in the 
ARS and MC-SD were judged for social desirability 
by students at the University of Washington and the 
scale values thus obtained were used to determine 
the proportion of items in these two scales keyed for 
socially desirable responses. Because of the nature of 
the FC-SD scale, it was not included in this part of 
the study. 


RESULTS AND DISCUSSION 


Table 1 gives the factor loadings of the 
scales on the first three rotated factors.5 


5A 1-page table giving the unrotated factor load- 
ings has been deposited with the American Docu- 
mentation Institute. Order Document No. 7124 from 
ADI Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress; Washington 25, 
D. C., remitting in advance $1.25 for microfilm or 


_ 


Scale 


32 Ma-S 
2 Mf 
Mp 
4 Ne 
5 No 
Nu 
Or 
38 Pa 
39 Pa-O 
40 Pa-S 
41 Pd 
42 Pd-O 
43 Pd-S 
44 Pn 
45 Pr 
46 Pt 


sn w 


oss Uw 


60 MC-SD 
61 FC-SD 


Table 2 shows the product-moment correla- 
tions between the loadings of the 60 scales on 
the first three factors and the proportion of 
keyed socially desirable items in each scale. 
It is clear from the magnitude of the correla- 
tions that the loadings of the scales on the 
first factor, which is bipolar, can be quite ac- 
curately predicted from either knowledge of 
the proportion of keyed socially desirable re- 
sponses in the scales or the zero-order correla- 
tions of the scales with the SD scale. The re- 
lationship between the factor loadings and the 
proportion of keyed socially desirable items 
and the relationship between the factor load- 
ing and the correlations of the scales with the 


photocopies. Make checks payable to: 


$1.25 for 
Chief, Photoduplication Service, Library of Congress 
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TABLE 2 

CORRELATIONS OF FACTOR LOADINGS WITH PERCENTAGE 

oF Keyep SD Items In 60 § ES 
CORRELATION OF THE 60 ScA, 


AND WITH THE 
WITH THE SD 
SCALE 


Correlation with percent- 
age of keyed SD items 
Correlation with SD scale 


SD scale are consistent with results obtained 
by Edwards and Heathers (1962) and by Ed- 
wards and Walker (1961) in their reanalyses 
of data reported by Kassebaum, Couch, and 
Slater (1959) and by Couch and Keniston 
(1961), respectively. 

For each of the 60 scales, we also obtained 
the proportion of items with scale values in 
the neutral interval on the social desirability 
continuum. These proportions were then cor- 
related with the absolute magnitudes of the 
loadings of the scales on the first three fac- 
tors and with the absolute magnitudes of the 
correlations of the scales with the SD scale. 
The correlations with the factor loadings were 

-.57, .38, and .44, respectively, and the cor- 
relation with the SD correlation was 
We see that those scales which contain a 
large proportion of neutral items tend to have 
low absolute loadings on the first factor and 
low with the SD 
scale. At the same time, scales with a large 
proportion of neutral items also tend to have 
large loadings, without regard to sign, on the 
second and third factors. 

It has been suggested by Jackson and Mes- 
sick (1958) that an important response set 
influencing scores on MMPI scales is acquies- 
cence or the tendency to respond True to per- 
sonality items. As an index of acquiescence in 
a scale, they suggest using the proportion of 
items in the scale keyed for True responses. 
Although, as has been pointed out by Hanley 
(1961) and Edwards (1961), this index is 
often confounded with the keying of the same 
items for socially desirable or socially unde- 
sirable responses,® 


— 92; 


also absolute correlations 


we obtained the proportion 


6 In the present study, the correlation between the 
proportion of keyed socially desirable items and the 
proportion of keyed True items is —.45 for the 60 


scales. 


of keyed True responses in each of the 60 
scales and correlated these proportions with 
the loadings of the scales on each of the three 
factors. These correlations are —.62, .82, and 
—.19, respectively. The proportion of keyed 
True responses in the scales is a better pre- 
dictor of the loadings of the scales on the 
second factor than the proportion of keyed 
socially desirable responses, whereas the re- 
verse is the case for the first factor. We be- 
lieve that the first factor can be best described 
as a social desirability factor and the second 
factor as an acquiescence factor. Additional 
evidence on this point can be obtained by 
examining the loadings of the FC-SD scale on 
the factors. 

The experimental FC-SD scale consists of 
28 AB pairs of personality statements in 
which one member of each pair has a slightly 
higher social desirability scale value than the 
other. In half of the pairs the A statement 
has the higher scale value and in the other 
half the B statement has the higher scale 
value. Scores on the scale consist of the num- 
ber of times the subject has chosen the state- 
ment with the higher social desirability scale 
value. We do not see how acquiescence or the 
tendency to respond True could operate in 
such a scale, although this does not rule out 
the possibility that scores on the scale could 
be corrrelated with other measures of acquies- 
cence or that the scale might possibly have a 
loading on the second factor of our analysis. 
However, the fact that the FC-SD scale has 
loadings of .69, —.13, and .00, on the three 
factors is, we believe, supporting evidence for 
the statement that the first factor is primarily 
a social desirability factor, whereas the second 
factor seems to be primarily involved with 
the imbalance in the True-False keying of the 
scales investigated. 

It is clear, from Table 1, that the purest 
measure of the third factor is Wiggins’ (1959) 
Sd scale which has a loading of — .97 on the 


factor. This scale was developed by compar- 
ing the responses of subjects instructed to 
give socially desirable responses to MMPI 
items with those of subjects taking the test 


under standard instructions. Items which 
differentiated between the two groups were 
used to form the Sd scale. Similarly, the Mp 
scale, which has a loading of — .87 on the 
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third factor, was developed by Cofer, Chance, 
and Judson (1949) to detect MMPI records 
of subjects instructed to make a good impres- 
sion. These investigators suggest that the 
scoring key of the Mp scale is such that the 
scale might be regarded as a subtle LZ scale. 
The ZL scale, which also has a substantial 
loading of — .65 on the third factor, is de- 
scribed by Hathaway and McKinley (1951) 
as “a measure of the degree to which the sub- 
ject may be attempting to falsify his scores 
by always choosing the response that places 
him in the most acceptable light socially.” 
The MC-SD scale has a loading of — .61 on 
the third factor. Crowne and Marlowe (1960) 
state that they developed the MC-SD scale 
in terms of “the rationale underlying the Lie 
scale of the MMPI.” They add, however, 
that their items “are probably less extreme 
than the Lie items” (p. 350). 

Although the four scales, Sd, Mp, L, and 
MC-SD, which have substantial negative 
loadings on the third factor have been con- 
sidered by their authors as measuring the 
tendency of subjects to claim favorable but 
improbable traits, it should be clear, on the 
basis of the factor loadings of the scales, that 
this tendency is not identical with the ten- 
dency to give socially desirable responses as 
measured by the SD scale. The SD scale was 
designed to measure the tendency of subjects 
to give socially desirable responses under the 
usual or standard instructions of administra- 
tion of personality scales. The Sd and Mp 
scales, on the other hand, were designed to 


determine the degree to which a subject could 
and would change or falsify his responses in a 


favorable direction when specifically in- 
structed to do so. In essence, the theory 
underlying the Sd, Mp, and MC-SD scales is 
that of the Z scale which was designed to 
measure the tendency of a subject to claim 
desirable but improbable characteristics or, 
to put it bluntly, to lie. We suggest, there- 
fore, that a possible interpretation of the four 
scales with negative loadings on the third fac- 
tor is that they may be measuring the ten- 
dency of subjects to lie. 

Some supporting evidence that the MC-SD 
scale may be measuring the willingness of 
subjects to lie is available from a study by 
Marlowe and Crowne (1961). High and low 
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scoring subjects on the MC-SD scale were as- 
signed a spool-packing task which is de- 
scribed by the authors as “boring and repeti- 
tive.” After working at the task for a period 
of 25 minutes subjects were asked to rate 
their attitudes toward the task. It was found 
that the high scoring subjects rated the task 
as more enjoyable than low scoring subjects. 
Marlowe and Crowne interpret this finding as 
evidence that the MC-SD scale is measuring 
the need for social approval, but an equally 
appealing interpretation is that the high scor- 
ing subjects were more willing to lie than the 
low scoring subjects. It is possible, of course, 
that their lying may have been motivated 
by a need for social approval, but we suggest 
that not all individuals who have a strong 
need for social approval are willing to lie in 
order to gain approval. In our culture, in gen- 
eral, we attach greater social approval to 
truth-telling than to lying. Perhaps the trait 
represented by those scales with negative 
loadings on the third factor corresponds to a 
tendency to lie, when it is believed that lying 
may bring social approval. 

Whatever the nature of the third factor, it 
that considerable additional re- 
search is needed before we can clearly identify 
its implications for personality assessment and 
behavior. For example, our interpretation of 
the third factor is complicated by two ad- 
ditional facts. In the first place, we know that 
the Sd scale has 14 items in common with 
the Mp scale and 7 in common with the L 
scale, and that the Mp and L scales have 6 
items in common. These common items are all 
scored in the same direction. Similarly, Mf 
and Or have 5 items in common that are 
scored in the same direction. In general, 
items in common between those scales with 
negative loadings (Sd, Mp, and L) and those 
with positive loadings (Or, Mf, and Cn) tend 
to be keyed in opposite directions. Thus, the 
loadings of these scales on the third factor 
may be due in part to dependencies in com- 
mon item scoring. In the second place, we 
know that the absolute loadings of the scales 
on the third factor are positively correlated 
with the proportion of neutral items in the 
scales. All of the scales with large loadings 
on the third factor have a large proportion 
of neutral items and this common character- 


is obvious 
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istic may also be responsible in part for the Epwarps, A. L., & Heatuers, Louise B. The first 
loadings of the scales on the third factor. factor of the MMPI: Social desirability or ego- 
strength? J. consult. Psychol., 1962, 26, 99-100 
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THE PICTURE TEST: 


RATIONALE AND ONE VALIDATION OF THE 


METHOD 


ELI S. FLYER 


Personnel Laboratory, Lackland Air Force Base, Texas 


AND FRANCES M. CARP 


Trinity University 


A procedure has been designed that capitalizes on halo variance in ratings. To 
test the hypothesis of affective transfer from picture material to response cate- 
gories, over 300 Ss were tested with one format of the Picture Test, obtaining, 
indirectly, their attitudes towards 8 religious groupings. Objective scoring of re- 
sponses to the religious categories was accomplished, and Ss grouped by their 
stated religious affiliation. Distributions of responses to the religious categories 
were consistent and supportive of the test’s rationale—that affective transfer 
from picture material to response categories can occur with the method, and 
that responses can be meaningfully related to known group characteristics. 


“Halo” variance in rating behavior is fa- 
miliar to most psychologists. When individu- 
als are required to evaluate others for one di- 
mension of behavior or another, responses 
often seem to be colored by seemingly ex- 
traneous factors—likeability of the rated per- 
son, for example. While halo is usually con- 
sidered an unfortunate concomitant of the 
rating process, the phenomenon has provided 
the basis for an objective-projective test use- 
ful in attitude, interest, and personality re- 
search. 

The Picture Test has been under develop- 
ment for over 10 years. Early studies of the 
procedure have been reported by Flyer (1951, 
1952) and DeRath and Carp (1959). Ra- 
tionale of the method is that the halo process 
in ratings can be utilized to tell something of 
value about the rater, a point that has also 
been raised by Campbell (1950). The most 
recent version of the technique involves two 
types of material. A group of 96 sketches of 
men (full face, head only) has been devel- 
oped that are ordered in sets of eight draw- 
ings to a page, and bound in a booklet. Placed 
randomly within each set of eight, two of the 
men are generally liked by male subjects, two 
generally disliked, and four viewed as “neu- 
trals.”' The remaining material comprising 


1 Acknowledgment is made to the Hogg Founda- 
tion, University of Texas, for material support in 
the development of the Picture Test booklet. A con- 


siderable amount of experimental testing was re- 


6 


the test consists of a write-in answer sheet 
with certain categories listed. The categories 
may represent personality characteristics, oc- 
cupations, national or religious groups, etc. 
The format used most often consists of eight 
categories drawn from a single domain. Sub- 
jects are required to examine each set of 
eight pictures, and to classify each picture 
separately in terms of one of the categories 
so that each picture is placed once in a cate- 
gory, and no category used more than once 
for each page of picture material. This is re- 
peated with the remaining 11 pages of the 
Picture Test booklet. Scoring is accomplished 
by determining the number of liked minus 
disliked pictures placed in each category, with 
a + sign indicating a net score on the like 
side of the continuum, and a — sign indicat- 
ing the converse. The maximum score an in- 
dividual can obtain for any one category is 
+12 (12 liked pictures placed in that cate- 
gory) or —12 (12 disliked pictures placed in 
that category). 

Interpretation of liked-disliked picture 
placements is based upon the hypothesis that 
affective responses to the picture material are 
transferred to the categories used in their 
placement. The purpose of this report is to 
describe an investigation that tests this hy- 
pothesis directly. 
quired to develop sets of pictures with 
popularity values. 


required 





THE PicturE Test: RATIONALE AND VALIDATION 


PROCEDURE 

In the study reported here eight religious cate- 
gories were used as part of the test format. Over 
300 male subjects between 17 and 19 years of age, 
75% of whom were high school graduates, were re- 
quired to classify the eight people on each Picture 
Test page in terms of eight religious categories. Scor- 
ing .was accomplished by totaling separately for each 
subject the number of liked and disliked pictures 
placed in each category, and obtaining a net score 
for each category. The religious affiliation of each 
subject was obtained directly, and cases grouped by 
stated membership group. 


RESULTS 


Grouping subjects by their stated religious 
affiliation, means were obtained for like-dis- 
like picture placement in each religious cate- 
gory. Among Catholic subjects, the mean like- 
dislike picture placement in the Catholic 
religious category was 7.32, for Methodist 
subjects in the Methodist religious category 
6.56, Episcopalian 6.33, Baptists 6.74, Lu- 
therans 5.76, and Presbyterians 5.86. Predic- 
tion of religious affiliation through like-dislike 
picture placement was quite high, i.e., the 
biserial correlation coefficient in predicting 
Baptist-non-Baptist affiliation from Picture 
Test scores in the Baptist category was .80 
(Baptist sample mean 6.74, non-Baptist mean 
1.12, with a standard deviation total of 4.32). 


DISCUSSION AND IMPLICATIONS 


The results are clear and consistent in dem- 
onstrating an affective transfer from picture 
material to religious categories. Catholic sub- 
jects tended to see likeable people as Catho- 
lic; Methodist subjects perceived likeable peo- 
ple as Methodist. Similar results were ob- 
tained for each religious affiliation 
studied. 

For purposes of predicting religious affilia- 
tion, however, a single question would pro- 
vide the most economical and valid method. 
The investigation presented here is an exer- 
cise to demonstrate a method for indirect 
determination of religious affiliation. If the 


group 


method were successful here with a fairly firm 
criterion, might it not be used to obtain in- 
directly affective responses to attitude and 
personality categories, particularly categories 
where valid direct responses are more difficult 
to come by? For example, if eight negative 
personality or behavioral categories are used, 
such as “may commit suicide,” “homosexual,” 
“persecuted by people,” etc., is there a pos- 
sibility that significant affective transfer to 
one or more of these categories might be more 
revealing of the individual’s attitude toward 
these categories than would be a direct ques- 
tion? If occupational categories are used, and 
if an engineering student who professes mo- 
tivation to become an engineer places signifi- 
cant numbers of disliked pictures in that cate- 
gory, might not this reflect more revealingly 
the individual’s attitude toward that occu- 
pation? 

Research to date cannot answer these ques- 
tions definitively. What evidence there is, 
however, suggests that indirect assessment of 
attitudes, interests, and personality factors 
through the Picture Test method holds con- 
siderable promise. Experimental studies are 
required to determine the reliability and va- 
lidity of the method in various formats. Dem- 
onstrations of its possible usefulness in voca- 
tional counseling, predicting behavior pathol- 
ogy, and in attitudinal research are part of 
the long range research effort now underway. 
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EFFECT OF ORGANIZATION SIZE ON VALIDITY 
OF MASCULINITY-FEMININITY SCORE 


ALBERT PORTER 


School of Business, San Jose State College 


A population of 195 pre-1944 male Stanford Graduate School of Business MBAs 
was analyzed for association between executive success criteria and scores on 
the Strong Vocational Interest Blank Masculinity-Femininity (SVIB MF) scale, 
controlling for size of employing organization (large N = 47, medium N = 105, 
and tiny N = 43). No significant correlations were found between the SVIB MF 
scores and pay, job interest, or career progress satisfaction. Correlations were 
significant (p < .05) between MF scores and policy level and organization level, 
but were negative in large organizations and positive in medium and small or- 
ganizations. The pattern of correlations suggests that masculinity of interests is 
positively associated with executive success in smaller organizations, inversely 
in larger organizations; more research is needed to test such a hypothesis and 
to explore the “meaning” of MF. 


Williams (1959) found significant correla- responded to the SGSB’s 1958 alumni voca- 
tion between reported pay for 1958, and pre- tional survey (Uhrbrock, 1959), and were as 
1944 score on the Masculinity-Femininity of 1958 located in the San Francisco Bay 
(MF) scale of the Strong Vocational Inter-area. 
est Blank for Men (SVIB). Subjects were 126 The present author, in a subsequent va- 
male alumni of the Stanford Graduate School _ lidity study (Porter, 1961) of pre-1944 SGSB 
of Business (SGSB) who received the MBA alumni wherever located—San Francisco Bay 
prior to 1944, took the SVIB while enrolled, area and elsewhere—included in the design 

1 Financial assistance for this research was pro- . pay Crierton and am a predictor both 
vided by the Western Management Science Institute, identical to the corresponding variables of 
of the University of California at Los AngelesGradu- Williams’ (1959) study. No significant cor- 
ate School of Business Administration. Computations relation was found between those variables. 
were performed on the IBM 709 at the University of The design also included five simple criteria 
cari at Las Angles Wester Data Frown addition to pay. The only one with which 
for access to the Strong Vocational Interest Blank MF showed significant correlation was “be- 


scores and for consultation on the research design ing in general management.” 


rABLE 1 


SIGNIFICANT VALIDITY COEFFICIENTS FOR SVIB MF Score 


Criterion 


Population or subgroup Policy Level in General 
thereof authority organization management 


All pre-1944 MBAs (N = 195 -+-18"" 
Large organization subgroup 
(N = 47 
Medium organization sub 
group (N = 105 
Tiny organization subgroup 


(N = 43 


Note rhe validity coefficients of the three other cr 
195 pre-1944 MBAs and three subgroups ther 
Decimals omitted; read 33 as —0.33 

* Significant at 5°% level 
** Significant at 1°% level 





ORGANIZATION SIZI 


AND MASCULINITY-FEMININITY 


TABLE 2 


MEANS, STANDARD DEVIATIONS, AND RANGES FOR ALI 


VARIABLES 


Population or subgroup thereof 


Large organ 
ization s/g 


Variable SD M 


Pay 7 2.4 
Policy authority 0.90 
Level in organization 1.7 1.6 
1.4 0.72 
1.9 0.62 


0.28 


Job interest 
Career progress satisfaction 
Being in general management 0.08 


SVIB ME scale 49.9 10.5 


195 pre-1944 MBA 
t f variable ranges for su 


Typical of alumni groups, the subjects were 
quite heterogeneous as to career circum- 
stances. It was felt that reanalysis of appro- 
priate population subgroups would thus be 
desirable. The present article reports research 
based on subgroups whose fractionation was 
according to size of subject’s employing or- 
ganization. 


METHOD 


Subjects. The 195 subjects forming the population 
of the author’s study (Porter, 1961) 
separated into three subgroups by size of employing 
organization: (a) large organization subgroup: 15,000 
employees and over (N = 47); (b) medium organi- 
subgroup: 50-14,999 employees (N = 105); 
) tiny organization subgroup: fewer than 50 


earlier were 


zation 
and (¢ 
employees (N = 43) 

The six simple criteria were: pay (nor- 
MBA), policy- 
(adjusted 


Criteria 


malized for since award of 


deciding 


years 


authority, level in organization 
job interest, career progress 
satisfaction, and being in general management. Cod- 


forth in 


for organization size), 
ing procedures and their rationale are set 
Porter, 1961. 

standard 


score 


Predictor. The predictor was the 
on the SVIB MF scale 


RESULTS 


Table 1 shows, for the total population and 
for each of the three subgroups, those va- 


lidity coefficients significant at 5% 
Table 2 


and ranges. 


or better. 


shows means, standard deviations, 


Medium 
organization 
47 s/g (N = 105) 


All pre-1944 
MBAs 
(N = 195 


Tiny organ 
ization s/g 


(N = 43 
SD WV ‘ V SD 
3.0 


0.91 


1.5 


3.2 
0.83 
1.5 
0.63 
0.66 0.63 
0.46 0.46 
| 5 8.8 


Nm — 


0.06 


me we NY 
2 oOo + 


DISCUSSION 

The pattern of validity coefficients sug- 
gests a hypothesis that masculinity of inter- 
ests was inversely correlated with executive 
success in large organizations, positively in 
small organizations. Such a hypothesis is 
weakened by the failure of MF to show as- 
sociation with pay, job interest, or career 
progress satisfaction in the full population or 
in any of the three subgroups thereof. 

It would seem, however, that two lines of 
further research are suggested by the present 
findings: (a) what does MF “mean’”’?—not- 
ing Strong’s (1955) observation that “we 
know almost nothing about what a given MF 
score means in a man’s everyday life’ (p 
125); and (6) what other effects of organi- 
zation-size might be revealed through analy- 
sis of organization-size-sorted subgroups in 
other executive-success validity studies? 
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A FURTHER ANALYSIS OF THE RELATIONS AMONG JOB 
PERFORMANCE AND SITUATIONAL VARIABLES 


EDWARD E. CURETON RAYMOND A. KATZELL 


University of Tennessee New York University 
In connection with an investigation of the relationships among employee at- 
titudes, performance, and characteristics of the situation data were collected 

The variables included 5 measures of divi 


from 72 divisions of a company 

sional performance and 5 descriptives of the situation. An oblique-factor 
analysis of these variables results in 2 positively correlated factors. The first 
factor is negatively with divisional and community size, and 
positively with productivity and profitability. The second is associated inversely 
with wage rate, unionization, and proportion of male employees, and positively 
with turnover. These results, in combination with some previously 


associated 


reported, 


indicate that performance is related to 


tion of the situation. 


This paper reports the results of an oblique 
factor analysis of the correlations presented in 
Table 1 of the article by Katzell, Barrett, and 
Parker (1961, p. 67). That table listed the 
intercorrelations among five measures of job 
performance and five situational characteris- 
tics in a sample of 72 warehousing divisions 
of a pharmaceutical company. Table 1, below, 
shows the loadings of these variables on the 
two resulting centroids when rotated to an 


oblique solution. The two factors are posi- 
tively correlated, the cosine of the angle be- 
tween the reference vectors being — .44. 
Factor I is associated positively with pro- 
ductivity and inversely with size variables; 
Factor II is associated positively with turn- 
over and inversely with three other situational 


TABLE 1 

ROTATED OBLIQUE FACTOR LOADINGS 
(N = 72) 

Variable Factor IT 


Factor I 


Performance variables 
Quantity 
Quality 
Profitability 
Product-value 
Turnover 
Situational characteristics 
Size of work-force 
City size 
Wage rate 
Unionization 


Percentage male 


2 aspects of the degree of urbaniza 


variables; it may possibly constitute a female- 
employee syndrome. 

It is of interest to compare these results 
with those yielded by the orthogonal factor 
analysis reported in the original article. There, 
Factor I emerged as a nearly general one, 
characterized by positive loadings on the pro- 
ductivity variables and negative loadings on 
the five situational characteristics. It was in- 
terpreted as portraying a small town or non- 
urban culture pattern. (Turnover was errone- 
ously listed in Table 2 of that article as 
having a loading of —.32; a recheck prompted 
by the present results showed that it has 
virtually no loading on this factor.) Factor 
II, which was not discussed there because it 
was both dim and not particularly relevant to 
the theoretical issue under discussion, gen- 
erally resembled Factor II as described in the 
present study. 

In summary, the oblique factor solution 
indicates that the nonurban culture pattern 
previously reported may be thought of as 
comprising two positively correlated facets: 
one reflects small size of plant and community 
and is associated with relatively high produc- 
tivity and profitability; the other reflects 
relatively low wages, proportionately few male 
employees and the absence of a union, and is 
associated with relatively high turnover. 
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